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1.  INTRODUCTION. 


This  report  overviews  the  results  of  our  research  over  the  duration  of  the  project;  detailed 
discussions  of  most  results  are  included  in  the  technical  papers  appearing  as  Appendices  A-D.  As 
originally  proposed,  resource  contention  problems  were  decomposed  into  system-level  and  node¬ 
level.  At  the  system  level,  we  have  performed  two  main  tasks:  first,  a  comparative  study  of  simple 
load-sharing  schemes  in  distributed  real-time  systems,  and  second,  development  of  actual 
algorithms  to  be  implemented  in  practical  systems.  This  work  is  reported  in  sections  2  and  3 
respectively.  At  the  node  level,  our  objective  has  been  to  consider  scheduling  policies  under  real¬ 
time  constraints.  During  the  course  of  our  work,  it  became  apparent  that  nodes  are  often 
multiprocessors,  where  parallelism  in  task  execution  is  possible.  Thus,  we  have  also  focused  on 
this  issue,  and  obtained  the  results  reported  in  section  4.  Overall,  our  work  has  resolved  many  of 
the  problems  identified  in  the  original  proposal,  as  well  as  generated  new  ones.  In  some  cases, 
there  are  useful  extensions  of  our  results  which  can  oe  obtained  in  the  future,  given  the  framework 
created  in  this  project 

Appendices  A  -D  included  in  this  report  are  technical  papers  which  have  already  appeared  or 
have  been  submitted  to  journals  or  to  conference  proceedings. 

2.  LOAD  SHARING  IN  SOFT  REAL-TIME  DISTRIBUTED  SYSTEMS. 

A  major  focus  of  our  research  during  the  past  contract  year  has  been  on  high-level  load  sharing 
(LS)  schemes  for  a  class  of  distributed  applications  which  are  subject  to  soft  real-time  constraints. 
In  such  real-time  systems,  jobs  generated  at  a  node  in  the  distributed  system  must  complete 
execution  within  a  specified  amount  of  time  after  their  initial  arrival  to  the  system;  otherwise  they 
are  considered  lost.  Examples  of  systems  exhibiting  such  soft  real-time  behavior  include  the 
general  class  of  applications  in  which  a  process  may  spawn  a  number  of  subprocesses  and  then, 
after  a  fixed  amount  of  time,  must  make  a  decision  based  on  the  results  of  the  subprocesses  which 
were  able  to  execute  (e.g.,  a  distributed  sensor  system,  in  which  multiple  hypotheses  are  to  be 


generated  and  evaluated).  A  second  application  is  in  distributed  industrial  process  control,  where  a 
failure  to  complete  a  computation  within  a  specified  time  constraint  (due  to  a  momentary  overload 
of  work  at  a  given  node)  may  require  the  initiation  of  an  expensive  recovery  procedure. 

In  these  soft  real-time  systems,  the  primary  performance  metric  is  the  maximization  of  the 
percentage  of  jobs  completed  within  their  specified  time  constraint.  Our  research  on  real-time  LS 
algorithms  has  been  based  on  the  premise  that  simple  real-time  LS  policies  may  perform  as  well  as 
their  more  complex  counterparts.  It  has  been  previously  noted  that  for  non-real-time  systems, 
relatively  simple  decentralized  policies  may  often  provide  effective  load  sharing  in  a  distributed 
system  [1].  These  works  have  motivated  our  work  during  the  past  year,  which  establishes 
complementary  results  for  the  case  of  real-time  systems,  systems  having  performance  requirements 
and  evaluation  metrics  which  differ  significantly  from  those  of  non-real-time  systems.  We  stress 
that,  as  in  [1],  the  goal  of  the  research  reported  in  this  section  has  not  just  been  to  propose  any 
specific  real-time  load  sharing  algorithm  nor  to  necessarily  develop  performance  models  for 
predicting  the  absolute  performance  of  specific  LS  approaches,  but  rather  to  address  the  more 
fundamental  question  of  the  level  of  complexity  required  to  implement  effective  load  sharing,  in 
this  case  in  a  distributed  real-time  environment 

2.1.  System  Models  and  Protocols 

Our  model  of  a  distributed  system  consists  of  N  nodes  which  are  interconnected  through  a 
communication  network;  the  network  is  assumed  to  be  logically  fully  connected  in  that  every  node 
can  communicate  with  every  other  node.  A  stream  of  jobs  is  submitted  locally  to  node  i.  We 
assume  that  the  nodes  are  heterogeneous  in  the  sense  that  each  node  may  have  a  different  arrival 
rate  of  externally  submitted  jobs,  but  homogeneous  in  the  sense  that  a  job  submitted  at  any  node  in 
the  network  can  be  processed  at  any  other  node  in  the  network;  this  latter  assumption  can  be  easily 
relaxed. 

We  are  interested  in  studying  LS  policies  in  a  soft  real  time  system,  in  which  a  job  is  lost  if  it 
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can  not  complete  or  begin  execution  (as  the  case  may  be)  within  a  given  time  constraints  the 
deadline  cannot  be  met  locally,  a  LS  algorithm  may  be  invoked  to  transfer  the  job  to  another  node 
which  can  possibly  meet  the  job's  demands.  We  assume  that  a  job  cannot  be  transferred  more  than 
once  in  order  to  avoid  the  problem  of  "trashing"  and  assume  that  a  constant  delay,  d,  (representing 
communication  and  transfer  processing  delays)  is  required  to  transfer  a  job  from  one  node  to 
another.  Thus,  if  a  job  first  arrives  at  node  i  with  an  initial  time  constraint  of  K1  and  is  transferred 
to  another  node  j  for  processing,  its  new  time  constraint  at  node  j  will  be  equal  to  (Kl-d). 

Our  research  has  examined  two  simple  approaches: 

1.  quasi-dynamic  load  sharing  QDLS 

2.  probing 

which  have  been  previously  studied  for  non-real-time  systems,  and  compares  their  real-time 
performance  with  that  of  the  bounding  cases  of  no  load  sharing  and  the  theoretically  optimum  real¬ 
time  LS  algorithm, 

A  LS  approach  can  be  characterized  by  its  transfer  policy,  and  its 
location  policy.  The  transfer  policy  determines  a  job  should  be  trans¬ 
ferred  for  remote  execution.  The  location  policy,  determines  where 
(i.e.,  at  which  remote  node)  a  transferred  job  will  be  executed. 

Both  QDLS  and  probing  have  the  same  simple  transfer  policy: 

Transfer  policy  (QDLS  and  probing): 

A  job  is  transferred  from  node  i  to  a  remote  node  if  and  only  if  the  unfinished  workload  of  the 
jobs  currently  at  node  i  exceeds  the  time  constraint  for  the  job.  A  job  will  thus  either  queue  for 
service  at  the  node  at  which  it  initially  arrives  (in  which  case  it  will  be  guaranteed  execution)  or 
will  be  transferred  to  some  remote  node.  We  note  that  the  transfer  policy  decision  is  made 
dynamically,  based  on  the  current  state  of  the  node.  There  are  no  previous  analytic  studies 
which  have  considered  this  transfer  policy  in  a  real-time  environment. 

The  location  policies  of  QDLS  and  probing  are: 

Location  policy  (QDLS): 
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If  a  job  is  to  be  transferred,  a  remote  "target"  node  (to  which  the  job  is  sent)  is  chosen 
probabilistically  and  independent  of  the  current  state  of  the  remote  nodes.  Note  that  QDLS 
requires  no  non-local,  dynamic  state  information.  Although  this  location  policy  has  been 
extensively  studied  for  the  non-real-time  case,  no  previous  analytic  work  has  addressed  this 
problem  in  a  real-time  environment 
Location  policy  (probing): 

When  a  job  is  to  be  transferred  a  node  probes  some  specified  number  of  other  system  nodes 
(chosen  at  random)  to  determine  if  one  of  them  can  currently  guarantee  execution  of  this  job, 
i.e.,  has  an  amount  of  unfinished  work  less  than  the  time  constraint  of  the  job  minus  the 
transfer  delay.  A  node  may  probe  up  to  some  maximum  number,  Lp,  (the  probe  limit)  of  other 
nodes.  If  none  of  the  probed  nodes  can  execute  the  job,  the  job  is  lost.  We  note  that  probing 
may  be  considered  a  simplified  form  of  bidding  [2].  The  probing  policy  studied  here  was  first 
analytically  examined  in  [1]  (for  non-real-time  systems)  and  we  follow  their  methodology 
when  smdying  the  system-level  model  (but  not  the  node-level  model)  of  probing. 

2.2.  Overview  of  Comparative  Study  Results. 

In  the  course  of  our  research,  analytic  performance  models  were  developed  to  study  the 
performance  of  the  QDLS  and  LS  approaches,  as  well  as  the  case  of  no  load  sharing.  The  case  of 
the  theoretically  optimum  LS  algorithm  was  examined  through  simulation.  The  details  of  the 
analysis  are  presented  in  Appendix  A  of  this  report. 

Figimc  1,  which  is  discussed  in  additional  detail  in  [3]  (also  Appendix  A)  shows  representative 
performance  results  for  the  QDLS  and  probing  real-time  load  sharing  schemes  and  compares  their 
performance  with  that  of  the  ideal  case  of  perfect-information  load  sharing  and  tlie  case  of  no  load 
sharing  (NLS).  In  this  case,  jobs  were  required  to  begin  execution  within  the  specified  time 
constraint  or  were  otherwise  lost.  The  performance  results  for  the  case  in  which  jobs  must 
complete  execution  within  the  time  constraint  showed  similar  behavior  and  are  J.so  discussed  in 
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detail  in  [3]. 


The  results  are  for  a  20  node  system  in  which  the  average  job  execution  time  was  1  job/second 
(exponentially  distributed)  and  a  network  job  transfer  delay  of  d  =  0.2  secs.  The  ’’ideal"  case  was 
modeled  as  an  M/M/20  queueing  system  with  a  time  constraint  of  Kl.  We  note  that  in  the  M/M/20 
system,  jobs  arc  scheduled  to  available  processors  using  complete  information  about  the  system 
state  and  incur  no  transfer  delay.  Thus,  the  "ideal"  performance  bounds  shown  in  the  subsequent 
results  are,  in  reality,  unattainable.  We  also  note  that  simulations  were  performed  to  validate  our 
analysis.  The  simulations  were  performed  without  many  of  the  assumptions  required  by  the 
analysis;  we  note  that  the  close  correspondence  between  our  simulation  and  analytic  results  indicate 
that  reasonable  modeling  assumptions  and  approximations  were  made  in  the  development  of  our 
analytic  model  of  probing. 

time  constraint  =  5 
network  delay  =  0.1 


Figure  1:  Herformance  of  the  Probing  Policy  for  Lp  =  1,3,5  under  a  symmetric  load 
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The  real-time  performance  of  the  QDLS  and  probing  approaches  is  demonstrated  in  Fig.  1  for 
probe  limits,  Lp  =  1,  3,  5;  the  case  of  Lp=l  corresponds  to  the  QDLS  policy.  As  expected  the 
performance  of  probing  approaches  the  ideal  limit  as  Lp  increases.  Note,  however,  that  a  relatively 
small  probing  limit  (Lp=5  when  Kl=0.5  (an  extremely  tight  time  constraint)  and  Lp=3  when 
K  1=5.0),  results  in  a  real  time  performance  extremely  close  to  the  unachievable  upper  bound.  Also 
note  that  increasing  the  probing  limit  beyond  a  relatively  small  number  can  result  at  best  in  only  a 
marginal  performance  improvement.  Wg  may  conclude  then  that  since  additional  probing  beyond 
some  small  probe  limit  incurs  additional  overhead,  a  relatively  small  probe  limit  would  be  sufficient 
in  practice  to  implement  effective  real  time  load  sharing. 

Perhaps  more  imponantly.  Fig.  1  provides  a  quantitative  basis  for  addressing  the  question  of 
determining  the  appropriate  level  of  complexity  for  LS  algorithms.  We  note  that  a  more  complex 
approach  can  at  best  achieve  a  performance  level  falling  in  the  gap  between  our  probing  results  and 
the  theoretical  optimum.  For  system  parameters  of  practical  interest  (i.e.,  a  system  loading  less 
than  the  physical  capacity  of  the  system  and  time  constraints  on  the  order  of  the  service  time  of  a 
job),  this  gap  can  been  seen  to  be  quite  small.  If  the  overhead  we  have  not  modeled  is  to  be 
considered,  the  small  performance  difference  between  probing  and  a  more  complex  approach, 
which  requires  additional  communication  and  computational  overhead,  can  only  become  smaller. 

The  most  important  conclusion  then  to  be  drawn  from  Fig.  1  and  our  additional  results 
discussed  in  Appendix  A  is  that  for  a  relatively  wide  range  of  system  parameters,  the  simple 
approaches  studied  perform  significantly  better  than  the  case  of  no  load  sharing  and  often  perform 
remarkably  close  to  that  of  the  theoretically  optimum  algorithm.  Our  conclusion  thus  complements 
previously-established  results  for  LS  in  non-real-time  systems  [1]:  very  simple  approaches,  which 
use  only  a  minimal  amount  of  state  information  and  have  an  extremely  simple  decision-making 
process  (and  hence  are  simple  to  implement)  are  often  sufficient  to  provide  effective  load  sharing  in 
a  distributed  real-time  computer  system.  A  corollary  then  is  that  for  all  but  the  tightest  of  time 
constraints  (e.g.,  values  of  the  time  constraint,  Kl,  less  than  the  average  job  service  time),  a  more 
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sophisticated  approach  towards  real-time  load  sharing  can  often  result  in  only  a  small  marginal 
performance  improvement  over  the  extremely  simple  load  sharing  algorithms. 

3.  ADAPTIVE  LOAD  SHARING  ALGORITHMS  IN  REAL-TIME  DISTRIBUTED 
PROCESSING  SYSTEMS. 

As  in  the  previous  section,  our  concern  here  is  with  distributed  processing  systems  where  jobs  are 
constrained  by  real-time  deadlines.  Thus,  the  performance  objective  is  to  minimize  the  fraction  of 
jobs  that  are  lost  due  to  exceeding  their  deadline.  Our  interest  now,  however,  is  in  actually 
developing  adaptive  algorithms,  which  can  be  incorporated  into  the  system  itself,  and  perform  load 
sharing  on-line.  First,  let  us  identify  three  desired  features  for  the  practical  applicability  of  such 
algorithms; 

1 .  Load  sharing  schemes  should  be  sufficiently  simple  so  as  to  incur  little  overhead  and 
communication  costs. 

2.  Stochastic  modeling  assumptions  regarding  the  nature  of  job  arrival  and  service  processes 
should  be  minimized  or  eliminated. 

3.  The  algorithms  should  require  little  or  no  information  about  the  parameters  of  the 
distributed  processing  systems  (since  these  parameters  may  be  hard  to  estimate  in  practice, 
as  well  as  subject  to  changes). 

As  we  shall  describe  below,  the  algorithms  we  have  developed  and  investigated  have  been 
designed  so  as  to  satisfy  these  requirements. 

Note  that  load  sharing  algorithms  can  be  categorized  in  terms  of  their  execution  mode 
{centralized  or  decentralized),  and  information  structure  (static  or  dynamic).  A  static  decentralized 
algorithm  satisfies  the  simplicity  requirement,  since  it  can  be  executed  at  each  node  separately  and 
with  no  instantaneous  state  information.  One,  however,  should  expect  dynamic  algorithms  to 
perform  better;  hence,  it  is  important  to  study  the  tradeoff  between  simplicity  (no  state  information) 
and  performance. 

To  address  the  second  and  third  requirement  above,  our  goal  has  been  to  exploit  developments 
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in  sample-path-based  sensitivity  analysis  techniques  within  the  context  of  our  problem  (e.g.  [4]- 
[6]).  In  particular,  we  have  used  the  Perturbation  Analysis  (PA)  methodology,  and  have  sought  to 
obtain  extensions  and  generalizations  that  are  applicable. 

In  section  3.1  below,  we  present  a  static  decentralized  algorithm  we  have  developed  to  solve 
the  load  sharing  problem  with  real-time  constraints.  The  approach  is  similar  to  the  one  used  in 
problems  with  no  real-time  constraints  (e.g.  [7]).  In  section  3.1.1  we  outline  the  algorithm,  and  in 
section  3.1.2,  we  describe  the  PA  estimation  procedure  required  in  implementing  this  algorithm.  In 
section  3.2  we  include  simulation  results  illustrating  the  performance  of  our  algorithm.  In  section 
3.3,  we  discuss  extensions  of  the  algorithm,  including  a  dynamic  version  making  use  of  state 
information.  To  actually  develop  such  an  algorithm,  however,  we  have  to  derive  extensions  of  our 
estimation  procedures  to  accomodate  discrete  (integer- valued)  parameters.  We  outline  in  section 
3.4  the  work  we  have  done  along  those  lines. 

3.1.  A  Static  Decentralized  Adaptive  Load  Sharing  Algorithm. 

As  in  section  2.1,  a  distributed  processing  system  with  N  processors  is  modeled  as  a  network, 
with  each  node  representing  a  processor.  The  flow  of  jobs  arriving  at  node  i  is  denoted  by  Xj, 
i=l,. .  .,N.  The  key  idea  of  load  sharing  is  to  provide  a  control  mechanism  at  each  node  i,  so  as  to 
allocate  the  flow  over  all  nodes.  Thus,  when  a  job  is  received  at  node  i,  two  decisions  are  made: 

1 .  Transfer  decision:  to  keep  the  job  at  i  or  send  it  to  some  other  node. 

2.  Location  decision:  to  determine  the  node  j;ti  where  the  job  should  be  sent  (if  the  transfer 
decision  is  not  to  keep  the  job  at  i). 

For  simplicity,  we  assume  that  every  node  can  communicate  with  every  other  node  (however, 
this  assumption  can  be  easily  relaxed).  We  will  also  initially  assume  that  the  communication  delay 
in  transfering  a  job  is  negligible  (this  assumption  can  be  relaxed  in  the  future). 

Figure  2  shows  the  model  of  node  i  we  will  use.  The  responsibility  of  the  "control"  function  is 
to  split  the  flow  Xj  into  flows  Xjj,  Xij>0.  Thus,  jobs  actually  queued  at  node  i  for  processing  may 
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originate  at  any  one  of  the  locations  indexed  by  i=l,...,N.  In  our  model,  there  is  a  deadline 
associated  with  the  job  for  all  k=l,2,...  If  Wj^  represents  the  waiting  time  of  the  job,  then 
the  job  is  considered  "lost"  if  Wj,  >  Xj^,  i.e.  it  is  rejected  from  the  system  and  is  not  processed.  The 
flow  of  lost  jobs  at  node  i  is  denoted  by  Lj,  and  will  be  referred  to  as  the  loss  rate  at  i  (in  jobs/sec). 


Figure  2:  Node  i  Model 

The  objective  of  the  load  sharing  algorithm  is  to  determine  the  flows  Xy  for  all  i,j  =  1,. ,  .,N,  so 

as  to  minimize  the  fraction  of  lost  jobs  in  the  overall  system  (i.e.  the  probability  that  a  job  violates 

its  real-time  constraint).  Let  Pl  denote  this  fraction,  and  note  that  if  a  total  of  M  jobs  were 

observed,  then  the  corresponding  fraction  Pl(M)  can  be  expressed  as: 

N 

i=l 

where  Mj^  is  the  number  of  lost  jobs  observed  at  node  i.  Equivalently,  if  we  fix  an  observation 
interval  to  be  of  length  T,  then  we  can  write: 

1  vh"' 

M/T  T 

i=l 

By  letting  T-^oo,  (M/T)  becomes  the  total  job  flow  into  the  system,  denoted  by  f,  and  (Mj^/T) 

becomes  the  loss  rate  at  node  i,  defined  above  as  Lj.  Thus,  we  get: 

N 


i=l 
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and  since  f  is  the  fixed  total  system  load,  the  algorithm  must  minimize  the  sum  of  loss  rates  over  all 
processors.  Thus,  assuming  the  load  at  each  node  is  given  as  fj,  the  problem  can  be  stated  as 
follows: 


determine  Xn,  Xi2,...,Xjj,...,XisjN  so  as  to  minimize: 

N 

^i(Xli>. . .,Xjvj|) 

i=l 

s.t.  xjj  >  0  for  all  i,j  =  1,...,N 

-(2a) 

N 

^  Xjj  =  fj  for  all  i  =  1,...,N 

-(2b) 

j=i 

Using  standard  results  from  optimization  theory,  one  can  show  that  the  necessary  conditions 

for  solving  this  problem  are  the  following: 
dLj 

^  =  Vi  if  Xij  >  0  -  (3a) 

9Lj 

S  Vi  if  Xij  =  0  -  (3b) 

for  all  i,j=l,..,,N,  where  Vi  is  some  constant  to  be  determined.  The  derivative  9Lj/9xij  represents 
the  sensitivity  of  the  loss  rate  at  node  j  with  respect  to  a  change  in  the  flow  Xij.  This  is  also  referred 
to  as  the  marginal  or  incremental  loss  rate  at  node  j  with  respect  to  jobs  coming  from  i.  From  node 
i's  point  of  view,  the  interpretation  of  these  conditions  is  as  follows:  the  job  flows  Xy  (allocated  by 
i)  must  be  set  so  that  all  marginal  loss  rates  are  equal,  provided  Xij>0;  if  Xij=0,  then  the 
corresponding  marginal  loss  rate  must  be  higher  (i.e.  node  j  is  a  particularly  bad  node).  An 
algorithmic  implementation  can  now  be  easily  derived,  whereby  each  node  gradually  adjusts  its  job 
flow  allocations  until  conditions  (3a),  (3b)  arc  satisfied.  The  crucial  information  required  for  the 
execution  of  such  an  algorithm  consists  of  the  marginal  loss  rates  above. 

Before  describing  the  distributed  algorithm,  note  that  the  number  of  marginal  loss  rate  estimates 
needed  is  actually  only  N.  To  see  this,  let: 
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J 


representing  the  fraction  of  job  flows  that  node  i  allocates  to  j, 
Next,  let  us  define  the  following  useful  quantities: 


where: 

•  a  is  the  minimum  marginal  loss  rate  over  all  nodes.  The  corresponding  node  is  denoted  by 
kmin,  and  stands  for  the  "best"  node  under  the  current  load  allocation,  i.e.  the  best 
candidate  for  sending  additional  jobs  to  (since  this  node's  loss  rate  will  increase  the  least). 

•  aj  is  the  difference  between  the  marginal  loss  rate  at  node  j  and  the  "best"  marginal  loss 
rate.  Note  that  if  kn,|„=  j,  then  aj=  0. 

•  Ajj  represents  the  adjustment  to  be  made  to  based  on  current  information.  The  quantity 
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is  called  the  step  size  of  the  algorithm,  and  regulates  the  amount  of  adjustment  to  be  made 
at  each  iteration. 


The  precise  algorithm  execution  is  the  following: 


1.  Initialize  routing  variables  <}^j,  i,j  =  1 .  ,N. 

2.  Wait  for  some  observation  period  T^j,  m  =  1,2,. . . 

3.  Iteration  m,  m  =  1,2,...: 

aL: 

3.1.  Each  node  j  estimates:  j  =  1,...,N  (see  next  section). 

aLj 

3.2.  Each  node  j  sends  to  every  other  node  ivy. 

3.3.  Each  node  i  determines  a,  a^j,  and  Ajj  (defined  above). 

3.4.  Each  node  i  updates  its  routing  variables  <}>,>)=  1,. .  .,N: 


-(7) 


4.  Repeat  steps  2  and  3. 


The  issue  that  remains  is  the  estimation  of  the  marginal  loss  at  each  node  in  step  3.1. 


3.1.2.  Marginal  Loss  Estimation. 

Returning  to  the  node  model  in  Fig.2,  we  now  address  the  question:  how  can  we  estimate  the 
sensitivity  of  Lj  with  respect  to  the  total  incoming  flow  pj?  We  will  restrict  ourselves  here  to  one 
approach  which  allows  this  process  to  be  done  on-line,  based  on  the  Perturbation  Analysis  (PA) 
methodology. 

Consider  a  node  in  isolation,  and  let  arriving  jobs  be  indexed  by  k=I,2,...  Let  aj^  denote  the 
arrival  time  of  the  job,  d^  its  departure  time,  and  W|^  its  waiting  time  in  the  queue.  Furthermore, 
suppose  each  job  is  assigned  a  processing  time  denoted  by  tCj^  for  the  k^  job.  In  our  model,  if  Wj^ 
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>  the  job  is  lost  is  the  deadline  for  the  Job).  Thus,  if  Sj.  is  the  actual  service  time  the  system 

provides  this  job,  we  can  write: 

f  Ttk,  ifWk<Xk 


I  0,  otherwise 

It  is  easy  to  see  that  the  departure  time  satisfies  the  following  recursive  equation: 

djj  =  max  {d^.i,  aj^}  +  8,^ ,  k=l,2,...  -  (8) 

Similarly,  since  =  Wj^  +  Sj^,  the  waiting  time  satisfies: 

Wj^  =  max  {wji-.i  +  Sjj.i  -  Aj^,  0}  ,  k=l,2,...  -  (9) 

where  Aj.  is  the  k^  interarrival  time  defined  by  Aj^  =  aj^  -  aij.^.  Note  that  Wj^  =  0  whenever  the 
job  terminates  an  idle  period  at  the  processor.  Defining: 


lie  —  ■  ^k-l  =  Aje  -  Wjf.i  -  Sk.i  '(10) 

note  that  the  duration  of  an  idle  period  is  given  by  Ijj  provided  >  0  in  (10). 

Now  let  the  incoming  flow  p  be  perturbed  by  some  amount  8p.  Equivalently,  the  mean 
interanival  time  of  jobs  a=l/p  is  perturbed  by  an  amount  5a.  Thus,  all  interarrival  times  are 
perturbed  by  some  amount  SAj^  which  is  easily  obtained  from  5a  depending  on  the  job  interarrival 
time  distribution.  This  causes  perturbations  Swj^  in  the  waiting  times  w^.  These  penurbations,  in 
turn,  may  affect  the  service  times  Sk.  To  understand  this  perturbation  process  in  more  detail,  note 
that  in  a  perturbed  stochastic  realization,  (9)  becomes: 

w'k  =  max  {w'lj.i  +  S'k-i  *  A'k,  0}  ,  k=l,2,...  -  (11) 

where  w’k  =  Wk  +  Swj^,  S’k  =  Sk  +  8Sk,  and  A’k  =  Ak  +  5Ak.  Similar  to  (10)  we  can  also  define 
I’k  as  follows: 

I'k  =  ^k  S^k  *  '^k-l  '  Sk-1  •  SSk.i  -(12) 

Our  goal  now  is  to  determine  recursive  expressions  for  5wk  and  5Sk  combining  the  equations 
above.  As  long  as  these  expressions  depend  only  on  quantities  which  are  directly  observable  while 
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the  system  is  in  operation,  one  can  always  predict  on-line  the  effect  of  a  flow  perturbation. 
Following  some  algebraic  manipulations,  we  get: 


0  if  Ik  >  0, 1'k  >  0 

II 

to 

5wk- 

-i  +  5Sk_i-(Ik  +  5Ak)  iflk>0, 1’k^O 

-(13) 

6wk. 

.j  +  5Sk— 1  “  Ik  ~  I'k  ^  ® 

• 

Ik  ifik<o,rk>o 

if  Wk  >  Xk,  5wk  ^  Xk-Wk 

5S,  =  . 

-T^k 

if  Wk  ^  Xk,  5wk  ^  Xk-Wk 

-(14) 

0 

otherwise 

where  it  is  important  to  observe  that  and  are  evaluated  based  on  known  information:  Sw^ 
and  5Sk  are  the  iteration  variables  (initialized  to  0);  and  are  given  for  every  job;  is  obtained 
from  (10);  5Ajj  is  computed  based  on  the  flow  perturbation  of  interest  and  the  interarrival  time 
distribution;  and  is  obtained  from  (12)  using  known  values. 

From  a  computational  standpoint,  the  procedure  for  obtaining  Swj^  involves  only  simple 
arithmetic  and  comparisons,  as  shown  in  (13),  (14).  In  addition,  it  is  assumed  that  all  arrival  time 
information  is  stored  with  the  job  (time  stamping).  The  only  additional  burden  is  the  need  to  save 
prior  service  time  information  SSj^.i  in  evaluating  6w|j. 

Of  course,  our  ultimate  objective  in  the  distributed  load  sharing  algorithm  is  to  estimate 
derivatives  of  the  form  9L/3p,  where  L  is  the  loss  rate  at  the  standalone  processor  model  we  have 
considered  here.  Given  6wjj,  however,  this  is  a  relatively  simple  task.  Let  be  the  number  of 
lost  jobs  after  k  jobs  have  been  served  (either  actually  processed  or  rejected),  and  let  the 
perturbed  value  due  to  Sp.  We  can  now  evaluate  on-line  as  follows: 


■  5Mt-  1 

if  Wk  >  Xk,  5wk  ^  Xk-Wk 

SM^l  =  • 

5Mk+  1 

if  Wk  <  Xk,  5wk  <  Xk-Wk 

-(15) 

Finally,  an  estimate  of  the  derivative  dL/d^  is  given  by: 
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c  1 

UJ  8p 

where  dj^  defines  the  length  of  the  observation  period  on  which  the  estimate  is  based,  SP  is 
sufficiently  small,  and  is  obtained  from  (15),  using  (13)  and  (14)  to  evaluate  Swj^. 

3.2.  Simulation  Results. 

In  this  section,  we  present  results  from  the  algorithm  implementation  on  simulated  distributed 
processing  systems.  Our  objectives  here  are: 

•  to  demonstrate  the  convergence  of  this  load  sharing  algorithm,  both  for  simple  models 
(where  analytical  solutions  may  be  found),  and  more  complex  ones  (for  which  analytical 
solutions  are  not  available). 

•  to  demonstrate  the  adaptive  nature  of  the  algorithm,  where  flows  are  automatically  adjusted 
in  response  to  drastic  changes  in  the  system’s  operating  conditions. 

•  to  study  the  effect  of  the  two  parameters  affecting  the  performance  of  this  algorithm:  the 
step  size  Ti,  and  the  observation  period  length  T,  which  defines  the  points  where  flow 
adjustments  are  made. 

In  the  results  that  follow,  we  denote  by  Aj  and  Sj  the  arrival  and  service  process  characteristics  at 
node  i.  We  also  denote  by  Cj  the  deadline  (waiting  time  constraint)  distribution.  For  instance,  A2: 
EXP(1.0)  indicates  that  the  interarrival  times  at  node  2  are  exponentially  distributed  with  mean  1.0; 
Ci:CO(2.0)  indicates  that  all  jobs  submitted  to  node  2  have  a  constant  deadline  fixed  at  2.0  sec. 

In  the  first  few  cases  studied,  we  have  considered  a  four-node  system  and  examined  the 
following  cases. 

Case  1:  A^,  A2,  A3,  A4:  EXP(l.O) 

Si:  EXP(l.O),  S2,  S3,  S4:  EXP(4.0) 

Cl,  C2,  C3,  C4:  CO(2.0) 

In  Fig.  3,  for  a  fixed  observation  period  of  length  T,  defined  by  30,0(X)  jobs  per  iteration,  we 
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show  how  the  performance  (in  terms  of  fraction  of  lost  jobs)  improves  as  a  function  time  and 
convergence  is  attained,  provided  the  step  size  T|  is  sufficiently  small.  Note  the  instability  resulting 
at  11=5.0x10*1.  In  Fig.  4,  we  study  the  effect  of  observation  period  length  for  a  fixed  step  size 
Tj =5  .Ox  10*2.  For  the  case  of  1,000  jobs  per  iteration  only,  performance  initially  tends  to  the 
optimum  very  rapidly;  subsequently,  however,  the  loss  fraction  experiences  oscillations  due  to  the 
high  variance  of  the  marginal  loss  estimates. 

Case  2:  A^,  A2,  A3,  A4:  EXP(l.O) 

Si:  EXP(l.O),  S2,  S3,  S4:  EXP(4.0) 

Cl,  C2,  C3,  C4:  UN[1. 5,2.5] 

The  only  change  here  is  in  the  deadline  distribution:  deadlines  are  now  drawn  from  a  uniform 
distribution  in  [1. 5,2.5]  (mean  deadline  is  the  same).  Results  are  shown  in  Fig.  5,  with  ti=5.0x10* 
2  and  20,000  jobs  per  iteration.  Note  that  the  loss  fraction  converges  around  0.60  as  before. 

Case  3:  Ai,  A2,  A3,  A4:  EXP(l.O) 

Si*.  UN[0.5,1.5],  S2,  S3,  S4:  UN[3.5,4.5] 

Cl,  C2,  C3,  C4:  CO(2.0) 

In  Fig.  6,  we  show  algorithm  convergence  with  ti=5.0xI0*2  and  20,000  jobs  per  iteration.  In  this 
case,  however,  node  service  times  are  bounded  through  uniform  distributions.  It  is  still  possible  to 
obtain  analytical  expressions  for  loss  rates  in  this  system;  however,  our  aim  here  is  simply  to 
demonstrate  the  validity  of  our  estimation  procedure,  which  does  not  require  any  change  compared 
to  the  previous  models. 

Case  4:  Ai,  A2,  A3,  A4:  UN[0.5,1.5] 

Sit  UN[0.5,1.5],  S2,  S3,  S4:  UN(3.5,4.5] 

Cl,  C2,  C3,  C4:  CO(2.0) 

This  is  similar  to  Case  3,  except  for  the  interarrival  time  processes,  which  are  also  uniform.  No 
analytical  expressions  are  available  for  such  node  models.  Results  with  q=5.0xl0*2  and  20,000 
jobs  per  iteration  are  shown  in  Fig.  7. 
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Figure  5:  Uniformly  Distributed  Waiting  Time  Constraint  (4  nodes,  Ca 


2 


Fraction  of  Jobs  lost 


Fraction  of  jobs  lost 


Fraction  of  jobs 


-Q-  20000jobs/ite 
30000  jobs/ite 


Tlme(*2000) 


Figure  11:  Effect  of  Observation  Period  Length  on  Load  Sharing  Algorithm  (7  nodes) 


Case  5:  Aj:  UimS.l.S],  A2:  EXP(4.0),  A3:  CO(5.0),  A4:  DP(3.0,0.8;  8.0,0.2) 

Si:  UN[0.5,1.5],  S2,  S3,  S4:  UN[3.5,4.5] 

Cl,  C2.  C3,  C4:  CO(2.0) 

In  this  case,  arrival  processes  are  quite  different  at  each  node.  At  node  4,  we  have  a  discrete 
probability  distribution,  i.e.  the  interarrival  time  is  3.0  with  probability  0.8  and  8.0  with 
probability  0.2.  Once  again,  we  show  convergence  in  Fig.  8,  with  ti=5.0x10'2  and  20,000  jobs 
per  iteration. 

Case  6:  Ai:  EXP(l.O),  A2:  UN[3.5,4.51,  A3:  UN[7.5,8.5],  A4:  EXP(3.0) 

Si:  EXP(l.O),  S2:  EXP(4.0),  S3:  EXP(6.0),  S4:  EXP(8.0r 
Cl,  C2,  C3,  C4:  CO(2.0) 

Our  purpose  here  is  to  demonstrate  the  adaptive  properties  of  this  algorithm  in  a  general  system 
with  different  arrival  processes  and  inhomogeneous  processors  (node  4  is  8  times  faster  than  node 
1). 

In  Fig.  9  we  show  the  behavior  of  the  algorithm  when  node  1  experiences  a  degradation  of  a 
factor  of  20  (i.e.  the  mean  service  time  becomes  20.0  after  the  20^*’  iteration).  As  expected,  the 
fraction  of  jobs  lost  immediately  increases  (from  about  0.33  to  about  0.78).  The  load  sharing  is 
then  gradually  adjusted  to  a  new  optimal  allocation  with  a  loss  fraction  of  about  0.70.  Finally,  the 
initial  service  rate  of  nodel  is  restored,  and  load  sharing  gradually  returns  to  an  allocation  yielding 
a  loss  fraction  of  about  0.33. 

In  the  remaining  cases  examined,  we  have  considered  a  seven-node  system.  As  shown  in 
Figures  10-11,  the  basic  properties  of  the  algorithm  are  unaffected  by  an  increase  in  the  size  of  the 
system.  The  main  effect  is  in  the  selection  of  appropriate  step  size  and  observation  periods,  which 
become  more  constrained.  In  Fig.  10,  note  that  the  algorithm  tends  to  become  unstable  even  if 
T|=5.0x10'2,  a  value  which  was  sufficiently  small  in  the  four-node  model. 
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3.3.  Extensions  to  Dynamic  Load  Sharing. 

For  the  static  distributed  load  sharing  algorithm  considered  above,  it  is  clear  that  the  choice  of  step 
size  and  observation  period  length  parameters  is  critical  in  guaranteeing  fast  and  reasonably  smooth 
convergence.  There  are  several  simple  enhancements  one  can  immediately  notice,  which  remain  to 
be  done.  Specifically,  there  is  no  reason  that  these  two  parameters  should  remain  fixed  throughout 
the  algorithm  execution;  it  is  reasonable  to  start  out  with  large  step  size  and  short  observation 
periods,  which  can  provide  fast  initial  improvement.  Subsequently,  these  parameters  can  be 
adjusted  to  avoid  instability  and  to  gradually  approach  optimal  performance. 

Another  issue  that  remains  to  be  addressed  is  that  of  the  effect  of  communication  delays  in 
transferring  jobs  through  a  network.  When  such  delays  are  not  negligible,  we  can  no  longer 
replace  the  individual  marginal  losses  9Lj/3xij  by  the  single  derivative  dLj/3pj,  where  Pj  is  the  total 
flow  into  node  j.  In  other  words,  rather  than  a  single  class  of  jobs,  node  j  must  now  distinguish 
between  N  classes  (depending  on  the  source  of  the  job),  or  at  least  two  classes:  local  and  remote. 

The  next  interesting  task  is  that  of  extending  load  sharing  to  include  instantaneous  state 
information,  such  as  the  job  backlog  (queue  length)  Xj  at  node  i.  One  can  then  investigate 
threshold-based  load  sharing  schemes,  which  operate  as  follows: 

•  whenever  a  job  is  submitted  to  node  i,  check  the  queue  length  Xj  and  compare  it  to  some 
specified  threshold  Tj  (to  be  determined). 

•  if  Xi  <  Tj,  then  keep  the  job  at  node  i. 

•  if  Xj  ^  Tj,  then  send  the  job  to  some  other  node,  using  routing  variables  j?*i. 

Thus,  the  problem  here  involves  both  adjusting  the  thresholds  and  the  routing  variables.  Note  that 
Tj  is  integer-valued,  hence  standard  gradient  estimation  techniques  (including  PA)  are  not 
applicable.  It  is  often  the  case  that  such  systems  are  characterized  by  discrete  (integer-valued) 
parameters.  Part  of  our  work  has  therefore  focused  on  investigating  how  to  obtain  sensitivity 
estimates  in  this  case.  This  work  is  outlined  in  section  3.4,  and  described  in  more  detail  in  [8]  (also 
Appendix  B). 
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3.4.  Sensitivity  Analysis  for  Distributed  Processing  Systems  with  Discrete 
Parameters. 

The  problem  we  have  addressed  is  the  following.  Suppose  a  system  is  characterized  by  several 
discrete  parameters  (such  as  the  thresholds  defined  above).  Selecting  the  optimal  values  of  these 
parameters  can  drastically  affect  the  performance  of  the  system.  Furthermore,  these  parameters  can 
be  used  to  automatically  adjust  the  system  to  changing  operating  conditions  (e.g.  processor 
failures,  sudden  traffic  increases).  However,  to  be  able  to  make  such  adjustment  requires 
knowledge  of  the  performance  sensitivity  with  respect  to  the  parameters.  This  information  is 
generally  very  hard  to  obtain,  since  the  functional  relationship  between  performance  measures  and 
parameters  is  not  available. 

The  main  idea  we  have  investigated  is  that  of  modeling  systems  of  interest  through  augmented 
Markov  or  semi-Markov  chain  models.  We  have  developed  a  general  framework  for  obtaining  the 
types  of  sensitivities  mentioned  above,  and  have  verified  its  validity  for  some  simple  cases  (see 
Appendix  B).  This  approach  is  still  based  on  direct  observation  of  a  system  in  operation,  and 
requires  little  overhead.  It  remains  to  use  this  approach  in  order  to  implement  a  dynamic  load¬ 
sharing  scheme  as  described  in  the  previous  section. 

Another  area  where  this  approach  appears  to  be  promising  is  that  of  scheduling  different  types 
of  jobs  at  a  processor.  This  is  a  complex  problem  of  significant  practical  interest,  since  it  is  often 
the  case  that  jobs  are  classified  in  terms  of  priority,  real-time  constraints,  execution  length,  or  other 
characteristics.  We  present  a  brief  overview  of  the  problem  in  the  next  section.. 

3.4.1.  Dynamic  Processor  Scheduling. 

As  shown  in  Fig.  12,  the  scheduling  problem  involves  selecting  the  next  job  to  be  processed  from 
a  collection  of  K  queues,  each  representing  a  different  class.  In  the  simple  case  where  the  process 
rate  of  class  k  is  and  a  measure  of  priority  is  represented  by  the  waiting  cost  per  unit  time  Cj^,  it 
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can  be  shown  that  the  policy  minimizing  the  mean  job  delay  is  a  simple  static  one:  always  process  a 
job  from  the  class  with  the  highest  (ii^Cfc)  value  [9].  If,  however,  real-time  constraints  are  present, 
queue  capacities  are  limited,  or  other  complications  are  introduced,  a  dynamic  scheduling  policy  is 
expected  to  provide  better  performance.  For  some  simple  cases,  we  have  early  results  showing  that 
threshold  based  policies  are  in  fact  optimal. 


PROCESSOR 


Figure  12:  The  Processor  Scheduling  Problem 
The  policy  we  have  fomulated  to  be  analyzed  using  the  augmented  chain  approach  described  in 
Appendix  B  is  the  following.  Suppose  job  classes  have  been  prioritized  so  that  the  highest  priority 
jobs  are  in  queue  1,  and  so  on.  Our  objective  here  may  be  to  minimize  a  combination  of  average 
delays  and  loss  fractions  for  jobs  with  deadlines.  Then,  for  the  case  where  K=2,  consider: 

•  if  Nj  >  Tj,  process  class  1 

•  if  Nj  ^  Ti,  then:  if  N2  >  T2,  serve  class  2,  otherwise  serve  class  1. 

In  this  scheme,  the  processor  only  serves  class  2  jobs  if  queue  1  is  sufficiently  low  and  queue  2 
sufficiently  high.  As  in  other  threshold-based  policies,  the  question  is  that  of  determining  the 
optimal  values  of  T^,  T2.  This  problem  remains  to  be  solved,  and  comparisons  with  other 
scheduling  policies  remain  to  be  made. 

4.  DESIGN  AND  ANALYSIS  OF  PARALLEL  PROCESSING  SYSTEMS. 

In  this  section  we  report  on  two  problems  related  to  parallel  processing  systems  that  we  studied. 
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In  the  first  problem  we  studied  the  behavior  of  two  different  scheduling  policies  on  a 
multiprogrammed  multiprocessor  that  executes  parallel  programs.  In  the  second  problem,  we 
developed  mathematical  models  for  a  class  of  parallel  systems  that  can  be  modeled  as  acyclic  fork- 
join  queueing  networks.  We  report  on  each  of  these  in  the  remainder  of  this  section. 

4.1.  Multiprocessor  scheduling 

We  studied  the  performance  of  a  first  come  first  serve  (FCFS)  and  processor  sharing  (PS)  policies 
for  scheduling  parallel  programs  on  a  multiprogrammed  multiprocessor.  Specifically,  we 
developed  analytic  models  that  predict  the  behavior  of  PS  when  used  to  schedule  forkijoin  jobs 
onto  a  multiprocessor  and  compared  its  performance  to  FCFS.  Here  a  fork/join  job  consists  of  a 
number  of  tasks  that  can  be  executed  independently  of  each  other.  The  job  is  not  considered  to  be 
complete  until  the  last  task  completes.  The  fork/join  job  is  the  simplest  nontrivial  example  of  a 
parallel  job. 

We  developed  an  analytic  model  that  provides  tight  bounds  on  the  expected  response  time  of  a 
fork/join  job  under  the  assumptions  that  jobs  arrive  to  the  multiprocessor  according  to  a  Poisson 
process  and  that  task  service  times  are  independent  and  identically  distributed  exponential  random 
variables.  Details  of  the  analysis  can  be  found  in  [10]  (also  Appendix  C).  We  study  two  PS 
disciplines,  one  called  task  scheduling  processor  sharing,  the  other  Job  scheduling  processor 
sharing.  The  first  policy  schedules  tasks  independently  of  each  other,  thus  allowing  parallel 
execution,  whereas  the  second  policy  schedules  entire  jobs  to  individual  processors.  The  second 
policy  does  not  allow  parallel  execution  of  a  job.  We  find  that  task  scheduling  does  not  always 
outperform  job  scheduling.  Specifically,  job  scheduling  always  performs  better  when  the 
processor  utilizations  are  high.  This  is  because  at  high  utilizations  there  is  little  advantage  to  parallel 
execution  of  a  single  job.  On  the  other  hand,  task  scheduling  gives  preference  to  jobs  with  many 
tasks  over  jobs  with  few  tasks  unlike  job  scheduling  which  gives  equal  preference  to  all  jobs. 
Consequently,  small  jobs  complete  more  quickly  at  high  utilizations  under  job  scheduling. 
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We  also  compare  processor  sharing  with  FCFS.  We  find  that  FCFS  outperforms  processor 
sharing  for  a  large  class  of  workloads.  We  also  compare  the  performance  of  processor  sharing  and 
FCFS  for  systems  with  two  classes  of  jobs.  We  find  that  the  system  performs  poorly  when  the 
processors  are  partitioned  between  the  classes  as  compared  to  a  system  that  shares  the  processors 
amongst  all  jobs. 

There  remain  many  unanswered  questions.  These  include:  What  are  the  effects  of  priorities  on 
the  behavior  of  different  classes  of  jobs?  What  are  the  effects  of  real  time  constraints?  How  should 
job  and  task  scheduling  be  integrated  together  to  achieve  the  best  features  of  each  policy? 

4.2.  Models  of  Parallel  Systems. 

We  studied  a  class  of  acyclic  fork-join  queueing  networks  (AFJQN’s)  that  arise  in  the  performance 
analysis  of  parallel  processing  applications.  We  obtained  the  maximum  throughputs  and  developed 
upper  and  lower  bounds  on  the  response  times  of  jobs  that  execute  in  these  systems.  We  describe 
what  an  AFJQN  is  and  the  results  of  our  analysis  in  the  remainder  of  this  section. 

AFJQN's  arise  naturally  in  parallel  processing  applications.  Many  parallel  programs  are 
decomposed  into  tasks,  each  of  which  can  execute  on  a  separate  processor.  The  division  of  the 
parallel  program  into  tasks  can  be  described  by  a  directed  graph  where  the  nodes  of  the  graph 
correspond  to  tasks  and  directed  edges  represent  the  precedence  relations  between  tasks.  In  many 
cases  the  underlying  graph  is  acyclic  and  the  program  is  implemented  with  the  use  of  fork  and  Join 
constructs.  Briefly,  a  fork  exists  at  each  point  in  a  parallel  program  that  one  or  more  tasks  can  be 
initialized  simultaneously  A  join  occurs  whenever  a  task  is  allowed  to  begin  execution 
following  the  completion  of  one  or  more  tasks.  Forks  and  joins  are  reflected  in  the  underlying 
computation  graph  in  the  following  manner.  A  task  that  has  one  or  more  outgoing  edges 
corresponds  to  a  fork.  A  task  with  one  or  more  incoming  edges  corresponds  to  a  join.  These  are 

Strictly  speaking  a  fork  implies  that  (two  or  more)  tasks  are  started.  However,  our  definition  simplifies  the 
notation  required  for  the  analysis 
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exemplified  in  the  parbegin  and  parend  constructs  available  in  parallel  programming  languages 
such  as  Concurrent  Pascal  [11],  Concurrent  Sequential  Processes  (CSP)  [12],  and  Ada  [13]. 


(a) 


(a)  A  program 


(b)  The  associated 
Fork  join 
Queueing  network 


Figure  13:  (a)  A  parallel  program,  (b)  Associated  AFJQN. 


Consider  a  multiple  processor  where  each  task  of  a  specific  program  is  mapped  onto  a  separate 
processor.  The  execution  of  a  single  program  request  can  be  described  as  follows:  (i)  Upon 
completion  of  a  marked  task,  tokens  associated  with  the  program  are  routed  to  each  processor 
handling  the  tasks  that  follow  the  marked  task  in  the  underlying  computation  graph;  (ii)  Once  a 
processor  has  received  tokens  from  all  tasks  that  precede  a  marked  task  in  the  computation  graph, 
this  processor  is  allowed  to  execute  it.  Let  this  multiprocessor  be  required  to  service  a  stream  of 
requests  corresponding  to  different  instances  of  that  program  and  assume  each  processor  executes 
its  tasks  in  the  same  order  that  program  requests  arriveto  the  system.  We  have  described,  in  brief, 
an  AFJQN.  Figure  13a  illustrates  a  hypothetical  parallel  program  using  forks  and  joins,  and  Fig. 
13b  illustrates  the  corresponding  fork-join  queueing  network. 

This  class  of  queueing  networks  has  not,  in  general,  been  solved.  In  our  work,  we  have 
obtained  expressions  for  the  maximum  throughput  in  job  requests  per  unit  time  that  can  be 
processed  for  an  arbitrary  computation  graph  where  the  number  of  processors  is  at  least  as  large  as 
the  number  of  tasks  and  for  very  general  assumptions  on  the  job  request  process  and  service  time 
requirements  of  all  of  the  tasks.  In  addition,  we  have  obtained  upper  and  lower  bounds  on  the 
expected  program  execution  time  through  the  use  of  stochastic  ordering  principles  (see  [14]).  We 
have  shown,  for  example,  that  decreased  (increased)  variability  in  the  time  between  job  requests 
results  in  a  decrease  (increase)  in  the  job  execution  time.  Consequently  we  can  numerically  obtain 
bounds  by  assuming  that  the  times  between  job  arrivals  are  constant.  In  addition,  we  have  shown 
that  if  we  assume  that  the  times  required  to  traverse  each  path  between  the  source  and  the 
destination  in  the  AFJQN,  then  we  obtain  a  pessimistic  bound  on  the  average  response  time  by 
taking  the  average  of  the  {maximum )  of  the  times  over  all  paths  between  source  and  destination. 
Details  of  the  analysis  can  be  found  in  [15]  (also  Appendix  D). 

A  number  of  tasks  remain  to  be  done  and  a  number  of  interesting  questions  remain  to  be 
answered.  For  example,  we  have  not  developed  a  software  system  to  actually  calculate  bounds  on 
the  mean  program  execution  time.  In  addition,  there  are  numerous  other  parallel  processing 
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architectures  to  be  considered  such  as  one  where  tasks  are  not  mapped  to  a  processors  but,  rather, 
a  processor  is  allowed  to  execute  any  task  that  is  ready  for  execution.  The  work  reported  above 
does  not  address  some  of  the  issues  raised  in  real-time  systems.  For  example,  what  is  the 
probability  that  a  job  will  miss  a  deadline? 

5.  CONCLUSIONS. 

We  have  addressed  both  system-level  and  node-level  issues  in  distributed  systems.  At  the  system 
level,  we  have  considered  load  sharing  for  jobs  with  real-time  constraints,  and  determined  that 
simple  policies  can  provide  performance  very  near  the  ideal  optimum.  We  have  also  derived  and 
tested  load  sharing  algorithms  which  can  be  implemented  under  general  conditions,  requiring  no 
specific  modeling  assumptions  or  knowledge  of  system  parameters.  At  the  node  level,  we  have 
formulated  a  task  scheduling  problem,  and  have  investigated  some  parallelism  issues  for  the  case 
of  multiprocessor  nodes.  We  have  determined  that  the  advantages  of  parallelism  are  dependent  on 
several  factors,  and  that  a  simple  FCFS  approach  is  occasionally  preferable. 

In  the  development  of  adaptive  load  sharing  algorithms,  we  have  limited  ourselves  to  the  static 
case.  We  have,  however,  obtained  in  the  course  of  our  work  a  general  framework  for  on-line 
marginal  loss  estimation  to  be  used  for  extensions  to  the  dynamic  case.  This  is  the  subject  of  future 
work.  Furthermore,  an  issue  to  be  addressed  is  that  of  the  interaction  between  the  system  level  and 
node  control  in  the  presence  of  real-time  constraints.  The  task  scheduling  problem  itself  also 
remains  to  be  addressed  in  detail;  our  results  to-date  have  generated  a  suitable  framework  for 
accomplishing  this  in  the  near  future.  Finally,  our  work  on  parallelism  issues  has  given  rise  to  a 
number  of  problems,  such  as  the  effect  of  priorities,  and  the  question  of  effectively  combining  job 
and  task  scheduling  to  achieve  the  best  features  of  each. 
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Abstract 

In  soft  real-time  distributed  computer  systems,  a  job  submitted  at  a  node  in  the  network 
must  complete  or  begin  execution  within  a  specified  time  conatratnt,  otherwise  it  is  considered 
lost.  When  a  single  node  occasionally  experiences  an  overload  of  jobs,  it  may  still  be  possible 
to  execute  some  of  the  otherwise  lost  jobs  by  invoking  a  load  sharing  algorithm  to  distribute  the 
local  overload  to  other  system  nodes.  We  examine  several  relatively  simple  approaches  to  load 
sharing  and  show  that  these'  simple  real-time  load  sharing  algorithms  may  often  perform  as  well 
as  their  more  complex  counterparts.  Approximate  analytic  performance  models  are  developed  and 
validated  through  simulation.  The  performance  results  suggest  that,  over  a  relatively  wide  range  of 
system  parameters,  the  performance  of  these  simple  approaches  are  substantially  better  than  the 
case  of  no  load  sharing  and  often  close  to  that  of  a  theoretically  optimum  algorithm. 
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1  Introduction 


A  primary  motivating  factor  behind  the  development  of  diatributed  computer  ayatema  haa  been  the 
need  to  efficiently  utilize  the  resourcea  available  within  the  diatributed  environment.  In  this  paper, 
we  consider  the  case  of  sharing  the  computational  resources  of  the  system  nodes.  This  can  be  done 
by  transferring  Jobs  which  are  submitted  to  heavily  loaded  nodes  to  more  lightly  loaded  nodes.  This 
process  of  sharing  the  workload  over  the  entire  system  is  generally  known  as  load  balancing  or  load 
sharing  (LS).  Although  a  cost  (e.g.,  a  time  delay)  is  typically  incurred  by  transferring  a  job  from  one 
node  to  another,  the  performance  of  a  diatributed  computer  system  can  generally  be  improved  by  an 
effective  load  sharing  policy  [20]. 

In  this  paper,  we  study  so/t  real-time  systems.  Real  time  tasks  can  essentially  be  classified  into 
two:  (l)  tasks  which  must  begin  execution  within  a  specified  amount  of  time  after  their  initial  arrival 
to  the  system  and  (2)  tasks  which  must  complete  execution  within  a  fixed  amount  of  time  after  their 
initial  arrival  to  the  system.  The  first  set  of  jobs  are  characterized  by  a  bounded  queueing  time 
whereas  the  second  is  characterized  by  a  bounded  waiting  time.  For  both  types  of  jobs,  those  failing  to 
meet  their  deadline  are  considered  lost.  One  important  purpose  of  load  sharing  in  a  real-time  system 
then  is  to  minimize  this  percentage  of  jobs  lost.  Examples  of  systems  exhibiting  such  soft  real-time 
behaviour  include  applications  in  distributed  systems  for  industrial  process  control  [23],  autonomous 
manufacturing  |3|,  and  air  traffic  control  [9|.  In  these  applications,  results  of  a  computation  are  typically 
needed  in  order  to  perform  some  control  function  at  a  given  point  in  time.  Failure  of  a  job  to  meet 
its  deadline  may  then  require  the  initiation  of  a  recovery  procedure,  which  can  be  very  costly  from  a 
performance  standpoint  (9|. 

It  has  been  previously  noted  that  for  non-real-time  systems,  relatively  simple  decentralized  policies 
may  often  provide  effective  load  sharing  in  a  distributed  system  [21]  [7j.  These  works,  in  particular  the 
analytic  work  in  [7],  motivate  our  present  work  which  establishes  complementary  results  for  the  case 
of  real-time  systems,  systems  having  performance  requirements  and  evaluation  metrics  which  differ 
significantly  from  those  of  non-real-time  systems.  We  stress  that,  as  in  [7],  our  goal  here  is  not  to 
propose  any  specific  real-time  load  sharing  algorithm  nor  to  necessarily  develop  performance  models 
for  predicting  the  absolute  performance  of  specific  LS  approaches,  but  rather-  to  address  the  more 
fundamental  question  of  the  level  of  complexity  required  to  implement  effective  load  sharing,  in  this 
case  in  a  distributed  real-time  environment. 

In  this  paper,  we  adopt  an  analytic  approach  towards  evaluating  various  approaches  towards  real¬ 
time  load  sharing.  In  section  2,  we  review  previous  work  in  the  area  of  real-time  load  sharing  and 
then,  in  section  3,  describe  the  distributed  system  model  used  in  this  paper.  Our  analysis  for  tasks 
with  bounded  queueing  time  begins  in  section  4,  and  we  adopt  the  general  methodology  [7]  (also  (25j) 
of  first  developing  a  model  for  a  single  node  in  isolation  and  then  combining  these  node-level  models 
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into  a  single  system  level-model.  In  section  4.1,  we  first  develop  a  model  of  job  loss  from  a  generic 
system  node  in  isolation.  Because  we  are  interested  in  studying  real-time  performance,  our  model  of  a 
node  is  necessarily  different  from  that  traditionally  adopted  in  load  balancing  studies  for  non-real-time 
systems.  In  particular,  rather  than  adopting  a  Markov  chain  model  based  on  the  number  of  jobs  queued 
for  execution  at  a  node,  we  characterize  a  node’s  state  by  its  amount  of  “unfinished  work"  and  derive 
a  set  of  integro-differential  equations  governing  the  evolution  of  unfinished  work  at  a  node.  We  must 
additionally  carefully  distinguish  between  locally-arriving  jobs  and  transferred  jobs,  since  the  latter 
arrive  with  tighter  time  constraints  due  to  the  transfer  delay  incurred. 

In  section  4.2,  we  then  compose  instances  of  this  generic  node  model  to  create  a  system-level  model 
for  the  entire  distributed  system.  Central  to  this  composition  is  the  assumption  (first  introduced  in 
[7],  and  also  used  in  (2Sj)  of  independence  among  the  states  of  different  nodes,  an  assumption  we 
later  validate  for  our  system  under  study  through  simulation.  We  then  use  this  system-level  model 
to  quantitatively  study  the  real-time  performance  of  two  simple  approaches  towards  real-time  load 
sharing.  In  both  of  the  approaches  studied,  a  job  whose  deadline  can  not  be  met  locally  may  be 
transferred  to  a  remote  node  for  possible  execution.  In  the  first  approach,  termed  “quasi-dynamic  load 
sharing”  (QDLS),  a  job  which  can  not  meet  its  deadline  locally  is  sent  to  a  probabilistically-chosen 
remote  node.  This  job  will  then  be  either  successfully  executed  or  lost  at  the  remote  node.  We  note 
that  the  policy  of  probabilistically  selecting  a  remote  node  for  execution  has  been  extensively  studied 
for  the  non-reoi-ftme  case  [15,19,21,24,7];  the  policy  of  transferring  jobs  when  real-time  constraint  can 
not  be  met  locally,  however,  has  not  been  examined  in  any  previous  studies.  The  second  approach 
studied  is  the  probing  approadi  examined  in  [7]  for  the  case  of  non-real-time  systems.  In  this  approach, 
a  node  may  probe  some  limited  number  of  other  system  nodes  and  then  transfer  a  job  if  one  of  these 
nodes  can  execute  the  job  within  its  deadline.  If  none  of  the  probed  nodes  can  do  so,  the  job  is  then 
lost.  Finally,  we  compare  the  QDLS  and  probing  policies  to  the  bounding  cases  of  no  load  sharing  and 
the  theoretically  optimum  LS  algorithm.  A  similar  study  is  performed  for  jobs  with  bounded  waiting 
time  in  section  5.  However  we  shall  restrict  ourselves  only  to  the  simple  probing  policy  as  the  model 
becomes  sufficiently  comlpex. 

We  will  see  that  for  a  relatively  wide  range  of  system  parameters,  the  simple  approaches  studied 
perform  significantly  better  than  the  case  of  no  load  sharing  and  often  perform  remarkably  close  to 
that  of  the  theoretically  optimum  algorithm.  Our  conclusion  thus  complements  previously-established 
results  for  LS  in  non-real-time  systems  [7]:  very  simple  approaches,  which  use  only  a  minimal  amount 
of  state  information  and  have  an  extremely  simple  decision-making  process  (and  hence  are  simple  to 
implement)  are  often  sufficient  to  provide  effective  load  sharing  in  a  distributed  real-time  computer 
system. 
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2  Previous  Work  on  Real-Time  Load  Sharing 


We  can  ciasaify  previous  efforts  in  the  area  of  load  sharing  in  real-time  systems  into  two  classes; 
those  that  adopt  the  multiprocessor  model  and  those  that  adopt  the  distributed  system  model.  In  the 
multiprocessor  model,  jobs  arrive  at  an  omniscient  centralised  controller  which  matches  (schedules) 
the  jobs  to  the  processors.  Typically,  the  set  of  jobs  arrival  times,  timing  constraints  and  execution 
times  are  known  a  priori  to  the  centralized  scheduler.  In  the  distributed  system  model  (adopted  in 
this  paper),  jobs  may  arrive  to  any  node  in  the  system  and  a  node  has  no  a  priori  information  about 
future  arrival  times  of  jobs  nor  about  the  state  of  the  other  nodes  in  the  network. 

The  work  of  Muntz  and  Coffman  [18]  and  Leinbaugh  |12]  adopts  the  multiprocessor  model.  These 
efforts  are  directed  towards  determining  a  minimum  system  configuration  which  can  support  the  spec¬ 
ified  job  load  for  a  given  process  to  processor  scheduling  policy.  Real  time  multiprocessor  scheduling 
has  also  been  examined  in  [17],  in  which  a  graph  model  is  used  to  represent  timing  constraints  among  a 
set  of  periodic  tasks.  In  [2],  an  approximate  algorithm  is  presented  for  optimally  scheduling  n  periodic 
tasks  on  m  processors.  The  real  time  scheduling  problem  for  multiprocessors  was  also  considered  in 
[16],  although  the  performance  metrics  adopted  in  [16]  (essentially,  an  equal  average  load  at  each  node) 
are  perhaps  more  applicable  in  a  non-real-time  environment. 

There  have  been  relatively  few  previous  efforts  adopting  the  distributed  system  model  of  real  time 
load  sharing,  and  it  is  clear  that  work  in  this  area  has  just  recently  begun.  For  example,  the  explicit 
purpose  of  (13|  is  the  study  pf  real  time  scheduling  in  a  uniprocessor  environment  as  a  precursor  to 
examining  similar  issues  in  a  distributed  environment.  In  [26]  [22],  a  specific  load  sharing  scheme  for 
real  time  systems  is  proposed  and  its  performance  exan^ined  through  simulation.  The  load  sharing 
policy  introduced  in  (26]  [22]  is  based  on  the  use  of  focused  addressing  and  bidding  and  is  meant  for 
distributed  systems  in  which  real  time  periodic  jobs  are  given  preference  over  other  real  time  jobs. 
In  this  approach,  a  node  which  can  not  guarantee  the  execution  of  a  job  within  the  specified  time 
constraint  permits  other  nodes  to  bid  for  the  execution  of  the  job  and  at  the  same  time  may  transfer 
the  job  to  that  node  (or  set  of  nodes)  which  are  estimated  to  be  most  likely  to  be  able  to  successfully 
execute  the  job.  Although  this  sophisticated  algorithm  was  shown  to  perform  quite  well,  it  is  closely 
tied  to  the  notion  of  periodic  tasks.  Also,  the  authors  do  not  consider  the  performance  of  the  bidding 
scheme  relative  to  all  but  the  simplest  of  other  possible  approaches.  In  this  paper  we  demon.strate  that, 
in  fact,  simple  approaches  may  perform  as  well  as  the  more  sophisticated  approaches  over  a  wide  range 
of  system  parameters. 

3  The  Model  of  the  Distributed  System 
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Figure  1:  Model  of  a  distributed  system 

Our  model  of  the  distributed  system  is  shown  in  figure  1  The  sytem  consists  of  S  nodes  which 
are  interconnected  through  a  communications  network;  the  network  is  assumed  to  be  logically  fully 
connected  in  that  every  node  can  communicate  with  every  other  node.  A  stream  of  jobs  is  submitted 
locally,  to  node  i.  Unless  stated  otherwise,  we  will  assume  that  the  nodes  are  heterogeneous  in  the 
sense  that  each  node  may  have  a  different  arrival  rate  of  externally  submitted  jobs,  but  homogeneous 
in  the  sense  that  a  job  subinitted  at  any  node  in  the  network  can  be  processed  at  any  other  node  in 
the  network;  this  latter  assumption  can  be  easily  relaxed. 

We  are  interested  in  studying  LS  policies  in  a  soft  real  time  system,  in  which  a  job  is  lost  if  it  can  not 
complete  or  begin  execution,  (as  the  ease  may  be)  within  a  given  time  constraint.  If  the  deadline  cannot 
be  met  locally,  a  LS  algorithm  may  be  invoked  to  transfer  the  job  to  another  node  which  can  possibly 
meet  the  jobs  demands.  We  will  assume  that  a  job  cannot  be  transferred  more  than  once  in  order 
to  avoid  the  problem  of  "trashing”  and  assume  that  a  constant  delay,  d,  (representing  communication 
and  transfer  processing  delays)  is  required  to  transfer  a  job  from  one  node  to  another.  Thus,  if  a  job 
first  arrives  at  node  i  with  an  initial  time  constraint  of  K I  and  is  transferred  to  another  node  j  for 
processing,  its  new  time  constraint  at  node  j,  which  we  will  denote  K2,  will  be'equal  to  K  \  -  d. 

4  LS  for  Real  Time  Tasks  with  Bounded  Queueing  Time  . 

As  mentioned  earlier,  real  time  task  with  bounded  waiting  time  tasks  are  time  contrained  such  that 
a  job  must  begin  execution  within  K1  time  units  of  its  initial  arrival.  For  the  above  mentioned  jobs  we 
will  examine  two  simple  approaches; 
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•  quaai~ dynamic  load  sharing  (QDLS)  [15,19,21,24,7]. 

•  probing  [7|. 

which  have  been  previously  studied  for  non-real>time  systems  and  compare  their  real-time  performance 
with  that  of  the  bounding  cases  of  no  load  sharing  and  the  theoretically  optimum  real-time  LS  algo¬ 
rithm. 

As  discussed  in  [7],  an  LS  approach  can  be  characterized  by  its  transfer  policy,  and  its  location 
policy.  The  transfer  policy  determines  vihen  a  job  should  be  transferred  for  remote  execution.  The 
location  policy,  determines  where  (i.e.,  at  which  remote  node)  a  transferred  job  will  be  executed. 

Both  approaches  examined  have  the  same  simple  transfer  policy: 

•  TVansfer  policy  (QDLS  and  probing):  A  job  is  transferred  from  node  i  to  a  remote  node  if 
and  only  if  the  unfinished  workload  of  the  jobs  currently  at  node  i  exceeds  the  time  constraint 
for  the  job.  A  job  will  thus  either  queue  for  service  at  the  node  at  which  it  initially  arrives  (in 
which  case  it  will  be  guaranteed  execution)  or  will  be  transferred  to  some  remote  node.  We  note 
that  the  transfer  policy  decision  is  made  dynamically,  based  on  the  current  state  of  the  node. 
We  are  not  aware  of  any  previous  analytic  studies  which  have  considered  this  transfer  policy  in 
a  real-time  environment. 

The  location  policies  of  QDLS  and  probing  are: 

•  Location  policy  (QDLS):  If  a  job  is  to  be  transferred,  a  remote  “target"  node  (to  which  the  job 
is  sent)  is  chosen  probabilistically  and  independent  of  the  current  state  of  the  remote  nodes.  Note 
that  QDLS  requires  no  non-local,  dynamic  state  information.  Although  this  location  policy  has 
been  extensively  studied  for  the  non-real-time  case  [15,19,21,24,7],  we  are  not  aware  of  previous 
analytic  work  addressing  this  problem  in  a  real-time  environment. 

•  Location  policy  (probing):  When  a  job  is  to  be  transferred  a  node  probes  some  specified 
number  of  other  system  nodes  (chosen  at  random)  to  determine  if  one  of  them  can  currently 
guarantee  execution  of  this  job,  i.e.,  has  an  amount  of  unfinished  work  less  than  the  time  constraint 
of  the  job  minus  the  transfer  delay.  A  node  may  probe  up  to  some  maximum  number,  Lp,  (the 
probe  limit)  of  other  nodes.  If  none  of  the  probed  nodes  can  execute  the  job,  the  job  is  lost 
We  note  that  probing  may  be  considered  a  simplified  form  of  bidding  [22).  The  probing  policy 
studied  here  was  first  analytically  examined  in  [7]  (for  non-real-time  systems)  and  we  follow  their 
methodology  when  studying  the  system-level  model  (but  not  the  node-level  model)  of  probing 

In  the  analytic  performance  models  developed  in  the  following  sections,  we  will  ignore  several  aspects 
of  LS  approaches  which,  in  practice,  may  influence  their  performance.  Specifically,  we  will  ignore  the 
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processor  overhead  requirec  o  transfer  jobs  as  well  as  the  overhead  and  time  delays  required  to  probe 
a  set  of  nodes.  We  will  also  assume  in  our  analytic  model  of  probing  (but  not  in  our  simulation  model 
used  for  validation),  that  a  remote  node  which  responds  positively  lo  a  probing  message  will  always  be 
able  to  execute  the  transferred  Job,  even  though  that  node’s  workload  may  change  between  the  time  it 
sends  a  positive  response  and  the  time  a  transferred  job  arrives.  Our  reason  for  ignoring  these  details 
is  that  as  in  [7|,  rather  than  analyzing  the  absolute  performance  of  a  specific  LS  algorithm,  we  are 
instead  interested  m  analyzing  the  relative  performance  of  a  set  of  LS  approaches  as  a  function  of  their 
complexity.  In  particular,  we  are  interested  in  examining  possible  performance  differences  between 
simple  probing,  a  more  sophisticated  approach  towards  LS  and  a  theoretically  optimum  LS  algorithm. 
Ignoring  the  overhead  effects,  a  more  complex  approach  can  at  best  achieve  a  performance  level  falling 
between  probing  and  the  theoretical  optimum.  If  this  gap  is  small  (as  we  find  is  often  the  case),  the 
performance  of  probing  and  any  more  complicated  approach  are  necessarily  close.  When  overhead  is 
considered,  the  small  performance  difference  between  probing  and  a  more  complex  approach,  which 
requires  additional  communication  and  computational  overhead,  can  only  become  smaller.  Thus,  our 
abstract  models  do  provide  the  basis  for  a  meaningful  comparison  of  the  relative  performance  of  real 
time  load  sharing  strategies.  We  also  note  that  when  the  effects  of  overhead  that  we  have  not  modeled 
are  negligible  (as  our  simulation  results  demonstrate  can  be  the  case),  our  analytic  models  also  provide 
a  means  for  assessing  the  absolute  real  time  performance  of  an  LS  approach  as  well. 

4.1  Performance  Models  of  the  QDLS  and  Probing  LS  Algorithms 

In  this  section  we  develop  analytic  models  in  order  to  quantitatively  assess  the  real  time  performance 
of  the  QDLS  and  probing  LS  policies.  As  a  Rrst  step,  in  section  4.1  we  develop  a  performance  model 
for  predicting  the  steady  state  job  loss  from  the  "generic  node"  shown  in  figure  2,  without  reference  to 
any  specific  LS  policy.  This  model  is  then  used  in  section  4.2.1  to  predict  the  real  time  performance  of 
a  system  in  which  no  load  sharing  (NLS)  occurs.  Then,  adopting  the  methodology  introduced  in  (7,25| 
(with  modifications  to  permit  us  to  examine  QDLS  in  a  heterogeneous  system),  the  generic  model  of 
a  node  in  isolation  is  then  extended  in  sections  4.2.2  and  4.2.3  to  provide  a  system-level  model  for 
studying  the  performance  of  QDLS  and  probing. 

4.2  Job  Loss  from  a  Generic  Node 

Figure  2  shows  our  “generic"  model  of  an  individual  system  node.  It  consists  of  an  upper  queue 
and  a  lower  queue,  connected  to  a  single  server,  representing  the  computational  resource  at  a  node.  A 
job  arriving  to  the  lower  queue  with  an  execution  time  of  i,  must  complete  execution  within  1  +  r 
time  units;  a  job  entering  the  upper  queue  must  complete  within  K2  +  z.  Equivalently,  a  job  arriving 
to  the  lower  (upper)  queue  must  begin  service  within  an  amount  of  time,  ffl  (^2)  after  its  arrival; 
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Figure  2:  A  generic  node  In  isolation 

otherwise  it  will  be  lost. 

The  server  has  a  mean  service  rate  of  /i  jobs/sec  and  the  service  policy  is  FCFS  across  all  the  jobs 
belonging  to  both  the  queues;  we  note  that  in  the  case  that  the  difference  between  Kl  and  K2  is 
small,  (as  will  often  be  the  case  when  we  use  this  model  in  a  LS  context),  FCFS  closely  approximates  a 
"shortest  deadline  first”  scheduling  policy.  We  assume  that  the  arrival  of  jobs  at  the  lower  and  upper 
queues  can  be  modeled  by  Poisson  processes  with  mean  rates  Aj  and  Aj,  respectively. 

The  problem  of  queues  with  impatient  customers  has  been  well-studied  in  the  field  of  operations 
research.  Gavish  et  al.  (8),  study  an  FCFS  M/M/1  system  where  arriving  customers  are  admitted  only 
if  their  waiting  times  plus  service  times  do  not  exceed  some  fixed  amount.  Baccelli  et  of.  [l|,  study 
a  single-server  system  in  which  a  customer  is  lost  when  its  waiting  time  exceeds  a  random  threshold. 
They  derive  equations  for  several  configurations  of  arrival  rates,  service  time  distributions  and  patience 
thresholds  (time  constraints).  As  we  will  be  only  interested  in  determining  the  probability  of  customer 
loss,  we  may  adopt  a  simpler  approach  than  these  efforts. 

Thus,  let  F{w,t)  denote  the  probability  that  at  time  t,  the  unfinished  work  in  both  queues  is  less 
than  or  equal  to  w.  Without  loss  of  generality  we  also  assume  that  fCl  >  K2.  If  S(r)  denotes  the  PDF 
of  the  service  time  distribution,  then  following  the  approaches  in  (8)  (lO),  we  can  derive  the  following 
time-evolution  equations  for  F{w,t  +  At).  We  consider  three  different  regions. 

In  the  region  0  <  u;  <  fi'2  we  have, 

F(ti;,t  +  At)  =  (1  -  A,At)(l  -  AjAf)F(u; -I- At,() 

+  AiAt(I  -  AjAt)  /  B{w  -  u)duF{u,t) 

Jo 
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+  X2At{l  -  XiAt)  j  B{w  -  u)duF{u,t) 


The  expresBion  on  the  left  hand  side  gives  the  probability  that  the  unfinished  work  in  the  queue  at 
time  t  +  At  is  leas  than  w.  This  condition  can  be  realized  in  several  ways.  First,  no  jobs  may  have 
arrived  in  the  interval  |t,  t  + At].  In  this  case,  there  must  have  been  an  amount  of  work  less  than  w  +  At 
at  time  t.  The  second  term  on  the  right  hand  side  of  the  above  equation  denotes  the  probability  that 
exactly  one  arrival  (in  time  At)  at  the  lower  queue  brings  new  unfinished  work  to  the  queue  such  that 
the  unfinished  work  at  t  +  At  is  less  than  lo.  Similarly,  the  third  term  denotes  the  probability  of  exactly 
one  arrival  at  the  upper  queue  such  that  the  new  unfinished  work  at  t  +  At  is  less  than  w. 

In  the  region  K2  <  \u  <  Kl  we  have: 


F(u;,t  +  At)  =  (I  -  AiAt)(l  -  A7At)f’(u/ +  At,  t) 

+  AiAt(l  -  AjAt)  /  B{w  ~  u.)duF{u,t) 
Jo 


+  A]At(l  —  AiAt) 


The  first  two  terms  of  the  above  equation  are  similar  to  those  described  above.  The  third  term 
is  for  the  case  of  an  arrival  at  the  upper  queue  in  the  interval  [t,t  +  At).  With  probability  F[K2,t), 
the  job  joins  the  queue.  In  this  case,  the  new  work  brought  in  by  the  job  plus  the  unfinished  work  at 
time  t  must  be  less  than  w.  Note  that  the  probability  of  this  latter  event  must  be  conditioned  on  the 
event  of  the  unfinished  work  at  time  t  being  less  than  /C2.  Similarly,  with  probability  1  -  F{K2,t) 
the  arriving  job  finds  an  amount  of  work  greater  than  K2,  and  hence  does  not  join  the  queue.  In  this 
case  the  unfinished  work  at  time  t  must  have  been  less  than  w  +  At.  This  probability  must  also  be 
conditioned,  this  time  on  the  fact  that  an  arriving  job  did  not  join  the  queue. 

Rearranging  the  terms  in  the  above  two  equations  and  taking  limits  as  At  — *  0,  we  get: 


dF{w,t)  dF{w,t) 
dt  dw 

dF(u;,t)  _  dF(w,t) 
dt  dw 


-(Aj  +  A7)F(u;,t)  +  (Ai  +  Aj)  B(w  -  u)(iuF(u,t)|  0  <  u;  <  K2 
-Ai  |F(u/,t)  -  ^  B(u;  -  u)d„F(u,t)| 

-A,  |F(if2,t)- j^^*B(u;-u)d„F(u,t)|  K2<w<Kl 


and  taking  limits  as  t  — •  oo,  we  obtain  the  following  steady  state  equations: 
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<iF(w) 

dw 

dF{w) 

dw 


{Ai  +  Aj)F(u;)  -  fl(u;  -  u)£fuF(u)|  0 

Ai  |f(u;)  -  j  B(u;  -  ti)<i„/^(u)  I  + 

Xi  i^F{K2)  -  B{w  -  u)d^F{u)^  K2 


<  w  <  K2 


:i) 


<  w  <  Kl 


(2) 


In  order  to  provide  a  check  of  the  above  equations,  we  show  in  Appendix  A  that  equation  2  can 
be  independently  derived  in  a  different  manner  using  level  crossing  arguments  [4|.  In  the  case  that  Job 
execution  times  are  exponentially  distributed  with  a  mean  of  the  solution  to  equation  1  is  given  by; 

/(«.)  =  0<w<K2 

F{w)  =  F(K’2)  +  K2  <  w  <  Kl 

At  this  point,  we  could  now  proceed  in  a  similar  fashion  to  derive  an  expression  for  F{w)  in  the 
region  Kl  <  w.  However,  if  we  are  only  interested  in  computing  the  fraction  of  jobs  lost,  we  can  derive 
a  simpler  third  equation  by  considering  /low  conservation  across  the  boundaries  shown  in  Sgure  2.  The 
total  flow  into  the  node  consists  of  the  sum  of  At  and  Aj,  while  the  total  flow  out  c  he  node  consists 
of  a  departure  stream  from  the  server  and  the  two  loss  streams,  one  from  each  of  the  queues.  Hence, 

A,+A,  =  {l-F(0+)}M+{l-^’{if2)}A,+  {l-B(ifl)}Ai  (4) 


We  can  now  solve  the  set  of  simultaneous  equations  3  through  4,  to  numerically  obteiin  F{Kl),F[K2) 
and  f  (0+). 


4.3  Incorporating  a  Generic  Node  Model  into  a  Distributed  System  Model 

We  now  incorporate  our  model  of  a  generic  node  in  isolation  into  a  system-level  model  in  order  to 
study  the  performance  of  no  load  sharing,  QDLS,  and  probing.  In  each  of  our  models,  each  system 
node  will  be  represented  by  the  generic  node  model  of  figure  2.  The  arrival  rates  to  the  lower  and 
upper  queues  at  node  i  will  be  denoted  A*j  and  Aj,  respectively,  and  fi(u»)  will  denote  the  PDF  of  the 
unfinished  work  at  node  «;  note  that  although  F,{u;)  is  a  function  of  Aj,  Aj,  Kl,  and  K2,  we  have  not 
indicated  this  functional  dependence  directly. 

When  incorporating  our  model  of  a  generic  node  into  a  system-level  model,  we  let  the  arrivals  to 
the  lower  queue  at  node  i,  Aj,  represent  the  “external"  arrivals  of  jobs  to  node  i;  these  “external" 
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arrivals,  with  an  initial  time  constraint  (until  execution  begins)  of  Kl,  represent  jobs  which  are  first 
submitted  to  the  system  at  node  t  and  are  inputs  to  our  model,  specifying  the  load  on  the  distributed 
system.  With  probability  1  -  F,(/f  1),  an  externally  arriving  job  at  node  i  will  net  be  able  to  meet 
its  time  constraint  locally  (i.e.,  at  node  i).  In  this  case,  the  job  will  either  be  lost  or  will  be  sent  to 
another  node  for  possible  execution,  depending  on  the  load  sharing  policy  employed. 

The  arrivals  to  the  upper  queue  at  node  A,,  represent  the  arrival  of  “internally  transmitted” 
jobs  to  node  t,  i.e.,  the  arrival  of  jobs  which  have  been  transferred  to  node  i  from  other  nodes  in  the 
system,  and  thus  depend  on  the  LS  scheme  used.  The  time  constraint  (until  execution  begins)  for  these 
internedly  transmitted  jobs  is  K2.  Recall  that  d  is  the  fixed  delay  associated  with  a  job  transfer  and 
thus  K2  =  Kl  —  d.  Since  a  job  can  be  transferred  at  most  once,  a  job  which  arrives  to  the  upper  queue 
and  cannot  meet  its  deadline  is  unconditionally  lest. 

4.3.1  Job  Loss  with  No  Load  Sharing  (NLS) 

With  no  load  sharing,  no  jobs  are  transferred  between  nodes  and  hence  Xj  =  0  for  all  nodes  i.  In 
this  case,  we  can  solve  equations  3  and  4  for  F<(0''’),  F,(ff  1)  and  Fi{K2)  for  a  given  Aj  and  compute 
the  system*wide  loss  by: 

N 

loss  rate =  ^A\(l  -  f, (if  1))  (5) 

<=i 

4.3.2  Job  Loss  with  Quasidynamic  Load  Sharing  (QDLS) 

Recall  that  in  our  QDLS  approach,  when  an  externally  arriving  job  arrives  at  a  node  and  can  not 
finish  execution  within  the  time  constraint,  if  1  +  z,  it  is  transferred  to  a  probabilistically-chosen  remote 
node  for  possible  execution.  In  our  system-level  model  of  QDLS  then,  all  jobs  exiting  before  joining  the 

lower  queue  in  figure  2  are  transferred  to  another  node  for  possible  remote  execution.  Let  A'^^  denote 
the  job  transfer  rate  from  node  t  to  node  j  and  let  A'j*  represent  the  externally  arriving  jobs  that  are 
executed  locally.  Given  the  QDLS  scheme  and  given  that  an  externally  arriving  job  is  transferred  if 
and  only  if  it  can  not  be  executed  locally,  we  have  the  following  flow  constraints: 

•  Ai(l- F, (*!))  =  E,".!.,. .•'I' 

.  X\F.{Kl)  =  A" 

•  ■^2  =  Aj’,  for  all  nodes  j. 
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Given  a  act  of  flows  satisfying  the  above  constraints,  the  system-wide  Job  loss  under  QDLS  can  be 
easily  computed.  Since  all  job  loss  at  node  <  can  only  occur  in  the  upper  queue,  we  can  first  solve 
equations  3  and  4  for  f,{0'*'),  Fi(Kl)  and  Fi{K2)  and  then  compute  the  system-wide  loss  under  QDLS 
by; 

loss  rate  =  T.^=i  X'-sil-.Q  -  F,(K2)) 

(6) 

=  F,{K2)) 

Clearly  then,  the  system-wide  rate  at  which  jobs  are  lost  depends  on  the  values  of  {A'/}  (both  directly 
as  shown  above  in  equation  6  and  indirectly  through  the  dependence  of  F,{K2)  at  node  i  on  (A'j^}  and 

{A('}.)  Thus,  we  are  interested  in  determining  the  values  of  (Aj^}  which  minimize  equation  6  subject  to 
the  flow  constraints;  this  can  be  accomplished  using  any  constrained  optimization  procedure,  including 
the  procedure  described  in  (ll|. 

Finally,  we  note  that  unlike  [7],  we  have  not  assumed  a  system  of  homogeneous  nodes;  this  neces¬ 
sitated  the  use  of  an  optimization  procedure.  We  have,  however,  adopted  two  assumptions  introduced 
in  [7]  in  deriving  equation  6.  First,  we  have  assumed  that  the  individual  F.O’s  are  independent;  that 
is,  a  job’s  probability  of  being  executed  within  its  deadline  at  one  node  is  independent  of  the  state  of 
the  other  system  nodes.  A  second  assumption  is  that  the  arrival  process  at  each  of  the  upper  queues, 
which  is  formed  by  the  superposition  of  the  overflow  processes  of  the  other  system  nodes,  is  Poisson. 
We  note  that  these  assumptions  become  asymptotically  correct  as  the  number  of  system  nodes  gets 
very  large  [7,2S|  or  as  the  ratio,  A'^/Aj  becomes  very  small.  We  also  note  that  as  shown  in  section  5, 
for  N  equal  to  20  nodes,  our  simulation  studies  yield  performance  results  which  are  extremely  close  to 
those  predicted  by  the  analysis,  thus  corroborating  the  appropriateness  of  our  analytic  approximations. 

4.3.3  Job  Loss  with  Probing 

V 

For  the  case  of  probing,  we  follow  (7|  directly  and  obtain  analytic  results  for  the  homogeneous  case 
in  which  Aj  is  identical  for  all  nodes,  i.  As  a  consequence  the  steady  state  probabilities  F,{K2),  F,{Kl) 
etc.  will  also  be  identical  for  all  nodes.  Since  a  job  is  lost  only  if  it  can  not  be  executed  locally  within 
K 1  time  units  and  some  Lp  (the  probe  limit)  other  nodes  are  probed  at  random,  each  of  which  is  then 
found  to  have  a  current  load  of  unfinished  work  greater  than  K2,  we  have; 

loss  rate  =  A;(1  -  F(/fI))(l  -  F[K2))^'’’  (7) 

Note  that  we  cannot  yet  solve  equations  3  and  4  for  F[K2),  and  thereby  compute  the  loss  using  7,  for 
the  Aj  which  result  from  the  probing  policy  are  still  unknow.  Again  considering  the  homogeniely  of  the 
system,  we  note  that  the  steady  state  transfer  rates  out  of  all  nodes  must  be  identical;  similarly,  the 
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r&te  at  which  jobs  are  transferred  into  the  nodes  muyst  also  be  equal.  This  then  implies  that  the  steady 
state  flow  of  jobs  out  of  any  given  node  must  equal  the  steady  state  flow  of  internally  transmitted  jobs 
that  are  accepted  and  successfully  executed  at  this  node.  We  thus  have  the  following  flow  constraint 
for  all  nodes: 

A,(l  -  F(K1))  -  Ai(l  -  f(in))(l  -  F(K2))^’>  =  AjF(Kz)  (8) 

where  we  have  dropped  the  i  superscripts  and  subscripts  since  the  system  is  homogeneous.  E..,aations 
3,  4,  and  8  provide  four  equations  in  four  unknowns  (F{iifl),  F(K2),  f  (0'*‘)  and  Aj).  We  can  now  solve 
this  set  of  simultaneous  equations  for  F(/f  1)  and  F{K2)  and  directly  compute  the  loss  using  equation 
7. 


4.4  Numerical  Results 

In  this  section  we  present  representative  performance  results  for  the  QDLS  and  probing  real  time 
load  sharing  schemes  and  compare  their  performance  with  that  of  the  ideal  case  of  perfect-information 
load  sharing  and  the  case  of  no  load  sharing  (NLS).  We  consider  a  20  node  system  (the  same  size 
considered  in  [7|),  in  which  (i  =  I  jobf second  and  a  delay  of  d  =  0.2,  i.e.,  the  transfer  delay  is  20%  of 
the  job  execution  time. 

We  model  the  “ideal*  case  as  an  A//A//20  queueing  system  with  a  time  constraint  of  /f  1  and  have 
obtained  the  M/M/20  performance  results  through  simulation.  We  note  that  in  the  Af/M/20  system, 
jobs  are  scheduled  to  available  processors  using  complete  information  about  the  system  state  and  incur 
no  transfer  delay.  Thus,  the  “ideal”  performance  bounds  shown  in  the  subsequent  results  are,  in  reality, 
unattainable.  This  will  be  evident  in  our  performance  results,  where  for  large  values  of  /f  1  and  hea^y 
external  arrival  rates,  the  QDLS  and  the  probing  curves  approach  limiting  values  which  are  slightly 
above  and  to  the  left  (i.e.,  poorer  performance)  than  the  upper  bound  predicted  by  our  “ideal”  case  of 
the  M/M/2Q  queue. 

Figure  3  shows  the  fractions  of  jobs  lost  as  a  function  of  the  average  system  arrival  rate, 
under  the  QDLS  policy.  Performance  results  are  presented  for  di/ferent  values  of  the  initial  time 
constraint,  /fl  (0.5  sec.  and  5.0  sec.)  and  transfer  delays  of  0.1  and  0.2  time  units  (10%  and  20% 
opf  the  average  job  execution  time).  The  system  load  was  asymmetric,  with  half  the  nodes  having  an 

T'"  y  .  iT'"  A- 

average  arrival  rate  of  *  while  the  remaining  half  had  an  average  arrival  rate  of  — -  ■  We 

should  also  note  that  in  order  to  test  our  numerical,  optimization,  and  simnlation  procedures,  we  first 
studied  QDLS  in  a  completely  symmetric  system  (not  shown).  As  expected,  the  optimum  (A'/|i  5^ 
were  found  to  be  equal,  as  were  the  optimum  (A"). 

Several  properties  of  the  QDLS  policy  are  evident  from  Figure  3.  First,  note  that  even  for  the  most 
stringent  timing  conditions  (ff  =  0.5  and  d  =  0.2)  QDLS  performs  significantly  better  than  no  load 
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sharing.  We  additionally  note  that  aa  the  asymmetry  in  the  arrival  rates  increases,  QDLS  performs 
increasingly  better  than  no  load  sharing.  While  these  results  might  not  seem  surprising  at  first,  we 
note  that  QDLS  is  perhaps  the  simplest  of  all  possible  real  time  LS  approaches  and  makes  use  of  no 
non>locaI  dynamic  state  information.  We  also  note  that  for  a  time  constraint  of  5.0  (i.e.,  where  all  jobs 
must  begin  execution  within  5  times  their  average  execution  time),  the  performance  of  this  simplest  of 
all  LS  policies  approaches  that  of  the  “ideal”  case.  Moreover,  this  performance  difference  is  particularly 
small  in  the  system  load  regions  of  practical  interest,  in  which  the  arrival  rate  of  jobs  to  the  system  is 
less  than  70%  of  the  physical  capacity  of  the  system.  Figure  3  also  indicates  that  the  ideal  curves  show 
a  knee  at  an  average  load  of  1.0;  above  this  point  the  job  arrival  rate  exceeds  the  system’s  capacity 
and  thus  some  jobs  will  necessarily  be  lost.  We  also  note  that  lost  jobs  are  not  executed;  if  these  jobs 
were  to  be  executed,  higher  losses  would  result  since  these  lost  jobs  would  place  additional  demands 
on  the  service  capacity  of  the  nodes. 

Finally,  note  that  we  have  also  plotted  simulation  results  (point  values  shown  as  filled  squares) 
in  Figure  3  for  /fl  =  0.5.  These  simulation  results  were  obtained  without  making  the  independence 
assumptions  and  the  Poisson  approximation  for  A,,  required  by  the  analysis.  In  the  simulation,  the 

optimized  A‘/’a  from  the  analytic  model  were  used  to  determine  the  probabilities  with  which  a  trans¬ 
ferred  job  from  node  i  was  sent  to  some  remote  node,  j.  A  transferred  job  arrived  at  its  destination  d 
time  units  later  with  a  new  time  constraint  of  (Kl  •  d).  Note  that  the  close  correspondence  between 
the  simulation  and  analytic  results  corroborates  our  earlier  belief  that  the  approximations  introduced 
were  indeed  Justifiable.  In  the  case  that  d  was  chosen  to  be  a  smaller  value  (e.g.,  d  =  0.1,  not  shown 
here),  the  simulation  and  analytic  results  were  found  to  match  even  more  closely. 

The  real  time  performance  of  the  probing  approach  is  demonstrated  in  figure  4  for  probe  limits, 
Lp  =  1,3,5.  As  expected  the  performance  of  probing  approaches  the  ideal  limit  as  Lp  increases. 
Note,  however,  that  a  relatively  small  probing  limit  {Lp  =  S  when  Kl  =  0.5  (an  extremely  tight 
time  constraint)  and  Lp  =  Z  when  Kl  =  5.0),  results  in  a  real  time  performance  extremely  close  to 
the  unachievable  upper  bound.  Also  note  that  increasing  the  probing  limit  beyond  a  relatively  small 
number  can  result  at  best  in  only  a  marginal  performance  improvement;  this  results  from  the  fact 
that  the  probability  that  a  job  (which  is  to  be  transferred  out)  is  accepted  by  the  m‘^  probed  node, 
is  given  by  F{K2)[l  -  F{K2)\”*~^ ,  and  this  probability  decreases  rapidly  with  increasing  m.  We  may 
conclude  then  that  since  additional  probing  beyond  some  small  probe  limit  incurs  additional  overhead, 
a  relatively  small  probe  limit  would  be  sufficient  in  practice  to  implement  effective  real  time  load 
sharing. 

Once  again,  simulations  were  performed  to  validate  our  analysis.  The  results  (plotted  m  filled 
squares)  in  Figure  4  are  shown  for  /f  1  =  0.5  and  Lp  =  3.  As  with  the  QDLS  simulations,  the  simulations 
were  performed  without  assuming  independence  among  the  states  of  the  system  nodes  or  a  Poisson 
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arrival  rate  at  the  upper  queue.  We  also  note  that  in  our  simulations,  if  a  node  responded  positively 
to  a  probe  and  its  workload  had  increased  beyond  K2  within  the  d  time  units  required  to  transfer  a 
job,  the  transferred  job  was  simply  lost,  as  would  be  the  case  in  a  real  system.  Again,  we  note  that  the 
close  correspondence  between  our  simulation  and  analytic  results  indicate  that  reasonable  modeling 
assumptions  and  approximations  were  made  in  the  development  of  our  analytic  model  of  probing. 

Figure  5  indicates  the  dependence  of  the  fraction  of  jobs  lost  on  the  time  constraint,  K\.  Perfor¬ 
mance  results  are  shown  for  two  different  values  of  a  symmetric  load  (an  average  of  0.4  and  1.2  external 
arrivals/time  unit  per  station),  and  a  fixed  transfer  delay  of  0.1.  For  A  =  0.4  the  curve  for  the  Ideal 
case  lies  along  the  x-  axis.  As  expected,  as  ifl  increases,  the  performance  of  QDLS  and  probing  with 
Lp  =  Z  approach  the  upper  performance  bound.  More  importantly,  note  that  the  performance  with 
Lp  =  3  is  close  to  the  ideal  limiting  performance  values  for  even  very  stringent  time  constraints. 

In  summary.  Figures  3  through  5  provide  a  quantitative  basis  for  addressing  the  question  of  deter¬ 
mining  the  appropriate  level  of  complexity  for  LS  algorithms.  We  note  that  a  more  complex  approach 
can  at  beat  achieve  a  performance  level  falling  in  the  gap  between  our  probing  results  and  the  the¬ 
oretical  optimum.  For  system  parameters  of  practical  interest  (i.e.,  a  system  loading  less  than  the 
physical  capacity  of  the  system  and  time  constraints  on  the  order  of  the  service  time  of  a  job),  this 
gap  has  been  shown  to  be  quite  small.  If  the  overhead  we  have  not  modeled  is  to  be  considered,  the 
small  performance  difference  between  probing  and  a  more  complex  approach,  which  requires  additional 
communication  and  computational  overhead,  can  only  become  smaller. 

5  Real-Time  Tasks  With  Bounded  Waiting  Time 

In  this  section  we  focus  on  a  second  model  for  real-time  jobs,  one  in  which  a  job  must  complete 
execution  within  a  hxed  time  after  it  initial  arrival  into  the  system.  A  job  arriving  at  a  node  can  only 
be  serviced  locally  if  the  time  it  could  spend  in  the  queue  plus  its  service  requirement  is  less  than 
the  deadline,  otherwise  it  is  considered  lost.  As  stated  earlier,  we  assume  that  a  job  initially  arrives 
at  a  node  with  a  fixed  time  constraint  ffl,  a  FCFS  scheduling  policy  at  each  node  and  a  constant 
network  delay  d  associated  with  the  job  transferring  procedure  .  Hence  transferred  jobs  must  complete 
execution  at  the  destination  node  in  if  i  -  d  =  K’2  time  units. 

For  this  system,  the  transfer  and  location  policies  are  modified  to; 

•  Transfer  Policy  :  A  job  is  to  be  transferred  from  a  node  i  if  and  only  if  the  unfinished  work  at 
node  i  plus  the  service  requirement  of  the  job  exceeds  its  time  constraint  K I  and  its  service  time 
is  less  than  K2. 

•  Location  Policy  ;  The  location  policy  is  the  same  as  in  the  previous  section.  For  QDLS,  a  node 
probabilistically  selects  a  “target  node"  and  the  job  is  transferred  to  this  node.  For  probing. 
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Figure  5:  Convergence  of  various  LS  approaches  to  Ideal  for  large  h  1 


a  node  probe*  a  fixed  number  of  node*  in  the  system  to  determine  whether  any  of  them  can 
guarantee  the  execution  of  the  job,  i.e  if  the  unfinished  work  at  the  destination  node  plus  the 
service  requirement  of  the  job  is  less  than  K2.  The  job  is  transferred  to  the  first  node  which 
responds  positively. 

Note  that  for  this  system,  the  basis  on  which  a  job  is  denied  service  depends  not  only  on  the  current 
state  of  the  node  but  also  on  the  service  time  requirements  of  the  job.  Clearly,  tasks  requiring  a  service 
time  greater  than  Kl  cannot  be  executed  locally.  Furthermore,  of  the  tasks  which  cannot  be  processed 
locally,  those  requiring  a  service  time  greater  than  K2  will  not  be  accepted  by  any  node.  Therefore, 
transferring  these  tasks  is  futile.  This  distinguishes  the  transfer  policy  used  in  this  system  from  the 
system  we  had  considered  in  the  previous  section,  in  which  an  attempt  was  made  to  transfer  each  job, 
which  could  not  be  executed  locally,  regardless  of  its  service  time.  As  we  will  see,  this  sufficiently 
complicates  the  model  so  that  obtaining  a  closed  form  expression  for  the  unfinished  work  becomes  a 
difficult  task. 

In  the  model  developed  below,  we  again  ignore  the  processor  overhead  required  transfer  the  job  and 
the  time  delay  involved  in  probing.  Hence  the  actual  performance  realized  would  in  fact  be  higher  than 
the  computed  values.  The  reason  for  this  omission  is  that  we  are  again  interested  only  in  the  relative 
performance  of  the  system  rather  than  its  absolute  performance.  Also,  the  model  does  not  account  for 
the  fact  that  the  state  of  the  node,  to  which  a  job  is  transferred,  may  undergo  a  change  during  the 
time  required  for  the  transfer.  However,  our  simulations  do  account  for  any  changes  which  may  occur 
in  the  system  during  the  time  d. 

5.1  Performance  Models  for  an  Individual  Node  in  Isolation 

Figure  6  shows  the  model  of  an  individual  node  in  the  system.  Externally  arriving  jobs  (with  rate 
A))  comprise  the  lower  job  arrival  stream.  Of  these  jobs,  only  those  whose  deadline  can  be  met  by  the 
node  are  allowed  to  join  the  lower  queue  (hence  ail  jobs  in  the  lower  queue  have  a  time  constraint  equal 
to  Kl).  The  externally  arriving  jobs  which  are  not  accepeted  into  the  lower  queue  are  either  lost  or 
transferred  out  to  other  nodes.  In  the  QDLS  scheme  the  upper  job  arrival  stream  (with  arrival  rate 
Aj)  represents  the  transferred  jobs,  whereas  in  the  Probing  Policy  the  upper  stream  represents  probe 
arrivals.  A  probe  which  is  accepted  then,  traasiates  into  an  arrival  to  the  upper  queue.  Hence  all  jobs 
in  the  upper  queue  have  a  time  constraint  of  K2.  We  will  assume  that  arrivals  to  the  lower  and  upper 
queue  follow  a  Poisson  distribution  with  rate  A|  and  Aj  respectively. 

The  bounded  waiting  for  jobs  having  a  single  fixed  deadline  has  been  extensively  examined.  For 
Gl/G/l  systems,  Loris-Tegheim  [I4j  obtained  the  generating  functions  of  the  Laplace-Steiltjes  trans* 
forms  of  the  distribution  of  the  waiting  time,  for  cases  where  the  random  variables  give  rise  to  a  rational 
transform  function.  Cohen  (5)  (Model  II)  also  used  transform  techniques  to  study  the  equivalent  prob- 
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Figure  6:  A  generic  node  in  isolation 


lem  for  MfMli  and  MlDfl  queueing  systems.  Gavish  et.  al.  (8)  derived  analytical  expressions  for 
the  virtual  waiting  time  distribution  and  the  loss  incurred  by  the  system,  for  an  A//A//1  system  with 
an  FCFS  service  discipline.  Their  method  is  simpler  than  the  techniques  used  by  Loris- Tegheim  (14( 
and  Cohen  (5|.  We  thus  choose  to  extend  the  results  in  [8]  to  incorporate  the  second  time  constraint 
needed  in  our  model. 

As  in  the  previous  section,  following  the  approach  of  [8|  [10|,  we  can  derive  the  time  evolution 
expression  for  the  distribution  of  the  unfinished  work,  F{w,t  +  At)  in  two  different  regions.  Let  B{t) 
denote  the  PDF  of  the  service  time  distribution  of  the  externally  arriving  jobs.  Note  that  the  decision 
of  whether  or  not  a  job  can  be  locally  proccesed  depends  not  only  the  current  unfinished  work  at  the 
node  but  also  on  its  service  time  requirements.  Furthermore,  of  the  jobs  which  cannot  receive  service 
locally,  only  those  with  service  time  less  than  K2  may  possibly  be  successfully  executed  as  a  result  of 
the  LS  policies.  Thus  the  service  time  distribution  of  tranaferred  joba  will  no  longer  be  B{x).  Let  G{x) 
denote  the  PDF  of  the  service  time  distribution  of  these  transferred  jobs. 

In  the  region  0  <  to  <  K2  we  have. 
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The  probability  that  the  unhniahed  work  in  the  queue  at  time  t-hAt  is  less  than  w  can  evolve  from 
the  state  of  the  queue  at  time  t  in  the  following  ways.  The  first  term  on  the  right  hand  denotes  the 
probability  of  the  event  that  no  jobs  arrive  in  the  interval  [t,t  +  At),  in  which  case,  there  must  have 
been  an  amount  of  work  lea*  than  tu  +  At  at  time  t.  The  second  term  is  a  composite  of  two  terms  and 
arises  in  the  event  that  that  exactly  one  arrival  (in  time  At)  occurs  at  the  lower  queue.  In  this  case; 
(a)  if  the  unfinished  work  at  time  t  plus  the  work  brought  in  exceed  Kl,  the  job  is  rejected  and  no 
addtional  work  is  added  t/i  the  queue;  (b)  if  the  unfinished  work  at  time  t  plus  the  work  brought  in  is 
less  than  Kl,  the  job  join*  the  queue  and  brings  additional  unfinished  work  to  the  queue  such  that  the 
unfinished  work  at  f  +  Ai  •*  than  w.  The  third  term  is  similar  to  the  second  one,  except  that  it 
holds  in  the  event  that  an  arrival  occurs  at  the  upper  queue  and  no  arrival  occurs  at  the  lower  queue 
in  the  interval  t  +  At. 

In  the  region  K2  <  w  ''  Kl  we  have, 


F{w,t  +  At)  =  fl  -  AiAO(l-AjAf)F{u;  +  At,t) 

-  B(K1  -  u))d„F(u,t)  F{«;  +  At.t) 
+  /o~  B(w- 

1  AiAt(l  -  AiAt)F(u;  + At,t) 


4  AiAt(l  -  A]At) 


(10) 


Equation  10  is  similar  to  f/quation  9.  Note  that  since  the  unfinished  work  is  u;  >  K2  at  time  t  +  At,  a 
probe/job  arriving  at  the  node  in  the  interval  At,  will  be  declined.  Hence,  an  arrival  at  upper  queue 
adds  no  work  to  the  node 

Expanding  F{w  +  At.  i )  'vith  respect  to  the  first  variable  and  taking  the  limits  as  At  — ►  0  we  get 


dF(w,t)  _  dF{wJ) 
dt  dw 


dF{w,t)  dF{w.t) 
dt  dw 


-A,  r  {B{Kl-u)- B{w-u)}d„F{u,t) 

Jo 

-  f  {G{K2- u)  ~  G{w  -  u))duF(u,t)  0  <  w  <  K2 
Jo 

-A,  r  {B{Kl-u)- B(w-u)}d„F{u,t)  K2<w<  Kl 

Jo 


and  taking  limits  as  t  — >  oo,  we  obtain  the  following  steady  state  equations: 


dF{w) 

dw 


dF[w) 

dw 


{fl(Kl  -  u)  -  B(w~u)}d„F(u) 

*  {G(K2  -  u)  -  G(w  -  u)}  duF(u) 
(B(Kl  -  u)  -  B(w  -  u)}duF(u) 


0  <  tu  <  A’2 

K2  <  w  <  Kl 


(M) 

(r:! 
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Equations  11  and  12  can  also  be  directly  obtained  using  level  crossing  arguments  [4|  (see  Appendix  A). 

By  definition  the  maximum  unfinished  work  at  a  node  must  be  less  than  K 1 .  Thus  the  normalization 
condition  becomes, 

fKl 

/  /(u)du  =  1  (13) 

Jo 

So  far  we  have  not  obtained  an  explicit  expression  for  the  PDF,  C(i).  In  order  to  do  so,  we  consider 
the  various  job  streams  shown  in  Figure  6.  A  job  is  denied  service  locally  if  the  unfinished  work  in 
the  queue  plus  the  service  time  of  the  job  exceeds  Kl.  Hence  the  probability  that  a  job  cannot  be 
processed  locally  is  given  by, 

p«>=  f^\l~  B{Kl-u)]d^(F{u), 

Jo 

an  the  rate  at  which  jobs  are  rejected  from  the  lower  stream  of  jobs  is  thus, 

=  AiP"'. 


Let  J{x)  be  the  service  time  distribution  of  the  jobs  in  the  A''*^  stream.  Then, 

-'w  = 

Of  these  jobs,  only  those  with  service  time  less  than  K2  may  be  transferred  out  (represented  by  the 
stream  A'^”)  in  Figure  6)  .  Thus, 


-  F(/f  1  -  u)ldu 

J{K2) 


(14) 


Using  the  expression  for  G(u;)  given  in  14,  in  Equations  11  and  12,  we  observe  that  the  pdf,  f{w) 
is  a  function  of  an  integral  containing  F{x)  itself.  This  adds  sufficient  complexity  to  the  equations  11 
and  12  that  we  resort  to  numerical  techniques  to  solve  for  F(w). 

Before  proceeding  to  compute  the  losses  incurred  in  the  system,  we  note  that  since  we  are  considering 
a  homogenous  system  (in  which  each  node  experiences  identical  external  job  arrival  rates),  the  long 
term  time  averages  of  the  unfinished  work  at  each  node  will  be  the  same.  We  therefore  do  not  attach 
a  superscript  i  with  the  variables  which  denote  the  node.  As  in  section  4,  in  the  homogenous  case  the 
system  wide  losses  can  again  be  easily  computed  by  simply  observing  a  single  node  in  isolation. 
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5.2  Job  Loss  with  no  Load  Sharing 

With  no  load  sharing,  =  0,  and  the  problenri  reduces  to  a  bounded  waiting  time  problem  for  an 
MIMfX  queue.  E^quations  11  and  12  become  identical  to  those  derived  in  (Sj.  Analytical  expressions 
for  f[w)  have  been  obtained  in  [8|.  The  loss  rate  experienced  by  a  single  node  can  be  easily  computed 
from, 

Loss  =  AiP*-*’  (15) 


5.3  Job  Loss  with  QDLS 

For  a  homogenous  system  the  QDLS  policy  reduces  to  the  Probing  Policy  with  Lp  =  1.  Therefore, 
we  discuss  the  losses  incurred  by  a  single  node  in  the  following  section.  We  note  that  for  a  heterogenous 
system,  extensive  numerical  computations  are  required  to  solve  both,  equations  11  and  12  as  well  as 
to  optimize  the  losses.  We  have  thus  chosen  to  study  the  simpler  homogenous  system. 

5.4  Job  Loss  with  Probing 

When  Probing  Policy  is  used,  locally  rejected  jobs  with  service  time  greater  than  K2  are  simply 
lost.  The  loss  rate  at  any  node  due  to  the  large  aerviee  timea  (the  stream  Aj^J*  in  Figure  6)  is  given  by, 

JK7 

Ai(l  -  B(iCl)  +  6(u)(l  -  ^(K-l  -  u)ldu 

For  the  remaining  jobs  in  the  stream  A'*^  (with  service  time  less  than  K2  and  denoted  by  the  stream 
in  Figure  6),  probes  are  sent  to  Lp  other  nodes.  A  job  is  lost  only  if  all  nodes  which  are  probed 
cannot  meet  the  deadline  of  the  job.  Thus  the  loss  rate  at  a  node  due  to  eongeation  at  other  nodes  is, 

AL.  =  (A'*^  -  A>^’)  ^""*11  -  G{K2  -  i))d„F(u)^  '  . 

The  loss  rate  at  a  single  node  can  be  determined  from, 

tM.  =  A;,„  +  A>".  (16) 

Equations  11  and  12  can  be  numerically  solved  for  a  given  set  of  parameters  Ai,  Aj,  ffl,  K2  and  F(0*) 
(the  impulse  function  of  the  unfinished  work  at  0)  (see  Appendix  B).  However,  F(0‘*')  and  Aj  are  still 
unknown.  We  now  exploit  the  homogeniety  of  the  system  (see  (7|)  to  obtain  the  flow  constraint  at  a 
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single  node.  As  described  in  section  4.3.2  at  steady  state,  the  flow  rate  at  which  jobs  are  internally 
transferred  out  of  a  node  must  equal  the  flow  rate  at  which  probes  are  accepted.  Thus, 

rKi 

Aj  /  G(K2-x)d^F{u)  =  a;"  -  A>«  -  X^,,  (17) 

JO 

Equations  13  and  17  can  now  be  solved  to  determine  the  unknowns  /'(O'*')  and  Aj. 

5.5  Numerical  Results 

In  this  secition  we  compare  the  losses  of  the  probing  policy  for  real-time  tasks  with  bounded  waiting 
time  with  that  of  no  LS  and  perfect  LS  for  a  system  of  20  nodes.  We  assume  that  the  service  time 
distribution  of  the  jobs  that  arrive  externally  is  exponentially  distributed  and  that  m  =  1  jobs/second. 

To  solve  for  F{0'*')  and  Aj  in  equations  13  and  17,  standard  IMSL  packages  were  used.  However, 
to  numerically  perform  the  integration  required  in  equations  13  and  17,  the  value  of  the  function  at 
discrete  points  are  required.  These  values  were  determined  from  the  equations  11  and  12.  Note  that 
the  left  hand  side  of  these  equations  can  be  expressed  as  f{w)  (the  pdf  of  the  unfinished  work),  which 
transforms  tne  integro-difFerential  equations  to  integral  equations.  The  integral  equations  were  solved 
using  the  method  of  substitution  and  the  integration  was  numerically  performed  using  the  trapezoidal 
rule. 

One  technique  to  determine  the  value  of  the  function  /(tw)  at  discrete  points,  for  a  given  set  of 
/'(O'*')  and  Aj  would  be  to  explicitely  substiute  the  expression  of  the  PDF  G{x)  using  equation  14  in 
equation  11.  Thus  f{vj)  becomes  a  function  of  the  integral  containing  F{w).  However,  for  incorrect 
values  of  /’(O'*')  and  Aj,  initially  provided  by  IMSL,  the  solution  of  f[w)  (using  the  the  method  of 
substitution  to  solve  11)  fails  to  convergence. 

A  second  approach  (used  here)  would  be  to  iteratively  iteratively  determine  the  value  of  G(x)  and 
use  the  given  set  of  G{x)  at  each  step,  to  solve  the  integral  equation  11  (see  Appendix  B  for  details). 
Partial  success  was  obtained  and  convergence  still  posed  a  considerable  problem,  particularly  for  large 
values  of  K I  and  Aj.  For  these  values,  the  losses  due  to  failure  in  probe  attempts  becomes  significant 
and  the  contribution  of  Aj^*^  to  the  system  wide  loss  rate  is  larger  than  that  of  A(^“.  In  fact  for  these 
cases  simulations  can  be  performed  in  a  shorter  amount  of  time.  Where  possible  we  have  provided 
numerical  values  for  the  losses.  For  tight  time  constraints,  e.g  Kl  =  1.5  convergence  can  be  achieved 
relatively  fast.  These  results  are  plotted  in  Figure  7.  For  a  larger  time  constraint,  K I  =  3.0  (see  Figure 
8)  convergence  was  slower.  For  example,  Lp  =  5  and  A|  >  1.4  convergence  of  our  numerical  technique 
posed  a  great  problem.  Nevertheless,  for  higher  values  of  Kl,  we  were  able  to  obtain  convergence  for 
values  of  Ai  <  0.7  (see  Figure  9).  It  should  be  pointed  out  that  our  technique  thus  does  provides  useful 
results  for  most  values  of  the  practical  range  of  system  p.*:.*.T.ctcrs. 
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Figure  7;  Job  loss  for  Probing  for  tasks  with  bounded  queue  time  for  Kl  =  1.5 

For  Lp  =  0,  the  probing  policy  reduces  to  NLS  for  which  analytical  solutions  of  the  function  f[w) 
is  available  in  (8).  The  losses  computed  from  equation  15,  for  Lp  =  0  were  identical  to  those  obtianed 
in  (Sj. 

As  in  the  previous  section  the  “Perfect  LS"  system  is  modelled  by  an  M/M/20  queueing  system, 
with  time  constraint,  Kl.  This  is  identical  to  the  system  where  the  jobs  are  scheduled  among  the  node 
processors  using  complete  state  information.  In  the  ideal  case  this  information  is  known  at  no  cost 
and  incur  no  transfer  delays.  Thus  the  performance  of  the  “ideal"  LS  policy  provides  an  unattainable 
lower  bound. 

Figure  7  plots  the  fraction  of  jobs  lost  for  an  extremely  tight  time  constraint,  ff  1  =  1.5  time  units, 
and  transfer  delay  of  0.1  time  units.  Note  that  since  =  1,  22.3%  of  the  jobs  have  service  times 
greater  than  1.5  and  hence  a  minimum  of  22.3  %  jobs  are  always  lost.  The  graph  clearly  shows  the  vast 
improvement  in  the  performance  of  the  system,  even  for  Lp  =  I.  For  low  arrival  rates,  Aj  <  0  8  (w  hirti 
80%  of  the  practical  range  interest  of  Aj),  Lp  =  I  achieves  close  to  ideal  performance  The  simulaiiunv 
results  are  plotted  for  Lp  =  I  and  Lp  =  3.  These  were  obtained  without  imposing  the  poissun 
assumption  on  the  arrival  rates  of  the  probes  (Aj).  The  close  match  between  the  simulation  results  and 
the  numerical  results  thus  justifies  this  assumption.  In  the  simulations  we  have  also  accounted  for  the 
fact  that  the  state  of  the  system  may  undergo  a  change  during  the  period  when  a  node  first  respomix 
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positively  to  a  probe  and  the  arrival  of  the  job. 

Figure  8  shows  the  performance  of  the  LS  policies  for  /f  1  =  3.0.  For  this  value  of  the  time  constraint, 
only  <4.5  %  of  the  jobs  are  lost  due  to  the  large  service  times  (>  3.0).  Accordingly,  load  sharing  plays 
an  important  role  in  preventing  losses  even  at  low  values  of  Ai.  For  Ai  <  0.8  we  see  that  Lp  =  Z  gives 
rise  to  losses  that  are  close  to  the  ideal.  Note  that  the  probe  limit  required  is  larger,  than  for  /fl  =  .  5. 
This  is  due  to  the  fact  that  a  larger  percentage  of  the  jobs  which  cannot  be  locally  processed  are  now 
eligible  for  transfer.  Also  since  the  time  constraint  is  larger  more  external  jobs  can  be  accepted  by  a 
node  thus  increasing  its  unfinished  work  load.  Values  obtained  via  simulations  are  plotted  for  Lp  =  3. 

Figure  9  plots  the  loss  incurred  by  the  system  for  a  much  more  relaxed  time  constraint,  Kl  =  6.0. 
As  mentioned  earlier,  we  were  able  to  obtain  results  for  values  of  Ai  >  0.7.  Surprisingly,  we  see  that 
with  Lp  =  Z  the  losses  incurred  are  close  to  ideal  for  Aj  <  0.8. 

A  common  trend  observed  in  all  the  performance  curves  is  that  the  probing  policy  with  probe  limit 
of  one  achieves  a  great  reduction  in  the  losses  over  the  NLS  policy  for  a  wide  range  of  Ai.  Further 
improvement  in  the  performance  is  obtained  by  using  Lp  =  3  and  we  see  that  this  value  of  the  probe 
limit  reduces  th*  losses  significantly  over  those  obtained  with  Lp  =  1,  particularly  for  larger  values  of 
Aj.  But  note  that  this  reduction  is  less.  By  comparing  the  graphs  for  Lp  =  3  and  Lp  =  5  particularly 
for  /fl  =  6.0  we  note  that  choosing  higher  values  of  Lp  provides  only  a  marginal  improvement.  This 
observation  only  serves  to  strengthen  our  claim  that  relatively  small  probe  limits  are  adequate. 

Comparing  the  losses  experienced  by  reaUtime  system  with  the  two  classes  of  tasks  in  Figures  3,  7 
and  9,  we  observe  that  the  losses  are  significantly  smaller  (for  equal  arrival  rates  and  comparable  time 
constraints)  in  for  the  system  in  which  jobs  must  complete  execution  within  the  fixed  time  limit.  This 
is  true  since  all  jobs  with  large  service  times  are  filtered  out,  allowing  for  a  larger  number  of  shorter 
jobs  to  be  processed  locally.  The  reverse  holds  for  tasks  with  bounded  queue  time.  A  large  job  greatly 
increases  the  unfinished  work  in  the  system,  thus  preventing  all  jobs  which  arrive  during  its  residency 
in  system  from  executing  locally.  As  such,  the  average  waiting  time  of  bounded-queueing-time  jobs  , 
with  time  contraint  Kl,  will  be  larger  for  than  that  of  bounded-waiting-time  jobs,  with  time  constraint, 
Kl  +  fi. 

6  Conclusiong 

In  this  paper,  we  have  examined  the  relative  performance  of  several  different  decentralized  ap¬ 
proaches  towards  load  sharing  in  order  to  address  the  question  of  determining  the  appropriate  level 
of  complexity  for  load  sharing  algorithms  in  a  distributed  real-time  environment.  Queueing  theoretic 
models  were  developed  to  quantitatively  assess  the  performance  of  two  relatively  simple  approaches 
towards  load  sharing  as  well  as  the  bounding  case  of  no  load  sharing.  The  “ideal"  case  of  load  sharing 
with  perfect  information  and  no  transfer  delays  was  studied  through  simulation.  The  assumptions  and 
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approximations  made  in  our  analysis  were  validated  through  simulation. 

A  major  conclusion  of  this  study  of  real-time  LS  approaches  is  complementary  to  that  previously 
established  for  non-real-time  systems  [21, 7|;  aimple  approaches,  which  use  a  minimum  amount  of  global 
state  information  and  involve  very  simple  decision  mechanisms,  can  often  achieve  a  performance  level 
close  to  that  of  a  theoretically  optimum  real-time  load  sharing  algorithm.  A  corollary  then  is  that  for 
all  but  the  tightest  of  time  constraints  (e.g.,  values  of  the  time  constraint,  Kl,  less  than  the  average 
job  service  time),  a  more  sophisticated  approach  towards  real-time  load  sharing  can  often  result  in 
only  a  small  marginal  performance  improvement  over  the  extremely  simple  load  sharing  algorithms.  In 
particular,  it  was  shown  that  a  aimple  probing  approach  using  a  small  probe  limit,  performed  close  to 
optimal  over  a  wide  range  of  arrival  rates  and  for  all  but  the  most  stringent  time  constraints. 

We  believe  that  future  work  in  this  area  may  be  directed  towards  extending  and  generalizing  the 
results  presented  in  this  paper.  In  particular,  it  is  of  interest  to  develop  performance  models  for 
systems  in  which  arriving  jobs  may  have  a  deadline  drawn  from  additional  deadline  distributions  and 
again  assess  the  appropriate  level  of  complexity  for  load  sharing  algorithms  in  these  cases.  In  would 
also  be  of  interest  to  consider  the  local  scheduling  of  jobs  to  processors  in  these  cases. 
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Appendix  A  :  Derivation  of  Equation  2,  H,  12  Using  Level  Crossing  [4]  [6] 
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Figure  10:  Level  Crossing  in  the  region  w  <  K2 

As  shown  in  figure  10,  if  we  plot  the  unfinished  work  in  the  generic  queue  in  figure  2  as  a  function 
of  time,  we  obtain  a  “sawtooth"  line,  where  the  vertical  jumps  represent  increments  of  work  brought 
to  the  queue  by  an  arriving  customer  and  the  slope  of  the  decreasing  sections  of  the  line  is  -1.  The 
point  at  which  an  increasing  or  decreasing  section  of  the  sawtooth  line  intersects  a  horizontal  line  of 
height  w  is  referred  to  as  an  “upcrossing"  or  “downcrossing"  at  respectively. 

A  major  result  of  level  crossing  states  that  the  density  function,  /(w),  of  the  “virtual  waiting  time 
(i.e,  the  total  unfinished  work  in  the  queueing  system)  is  equal  to  the  rate  at  which  downcrossings  cross 
a  line  of  constant  height,  w,  and  that  for  ergodic  systems,  the  rate  of  downcrossings  equals  the  rate  of 
upcrossings  through  this  line  (4j  (6j. 

Tasks  with  Botmded  Queueing  Time: 

In  order  to  determine  /(w)  in  the  region  u;  <  K2,  we  note  that  an  upcrossing  occurs  at  w  when 
an  arrival  to  the  generic  queue  finds  some  amount,  u  (u  <  ti;),  of  unfinished  work  in  the  queue  upon 
Its  arrival  and  itselt  joins  the  queue  and  brings  in  an  amount  of  work  greater  than  w  —  u.  If  B(x)  is 
the  PDF  of  the  service  time  demands  of  an  arriving  customer,  then  the  probability  that  the  amount 
of  work  brought  in  by  an  arriving  customer  is  greater  than  tw  -  u  is  simply  1  -  B{w  -  u),  and  the  rate 
of  upcrossings  at  u;  in  our  generic  queue  is  given  by 

rv) 

rate  of  upcrossings  =  (A|  +  Aj)  /  (1  -  B{w  -  u))f{u)du 

Jo 

Ekquating  the  rate  of  upcrossings  to  f{w),  the  rate  of  down  crossings  immediately  gives  equation  1  in 
the  region  w  <  K2. 
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In  order  to  determine  /(tw)  in  the  region  K2  <  w  <  /Cl,  we  separately  consider  upcrossings  due  to 
arrivals  at  the  lower  and  upper  queues  in  figure  2.  Following  an  identical  argument  as  above,  the  rate 
of  upcrossings  at  w  due  to  arrivals  at  the  lower  queue  is  given  by: 

rate  of  upcrossings  due  to  arrivals  at  lower  queue  —  \\  j  (1  -  B(w  —  u))f{u)du 

Note  that  a  job  arriving  at  the  upper  queue  will  only  join  the  queue  if  it  finds  an  amount  of  unfinished 
work,  u,  less  than  K2.  The  rate  of  upcrossings  due  to  arrivals  at  the  upper  queue  is  thus  given  by: 

rate  of  upcrossings  due  to  arrivals  at  upper  queue  =  Aj  /  (1  -  3(w  —  u))f{u)du 

Jo 

Equating  the  rate  of  upcrossings  and  f{w),  the  rate  of  down  crossings  at  w,  immediately  yields 
equation  2. 

Tasks  with  Bounded  Waiting  Time: 

Following  the  above  arguments,  for  0  <  u;  <  K2,  a  job  arriving  at  the  lower  queue  will  give  rise  to 
an  upcrossing  from  some  level  u  (<  lu),  iff  the  amount  of  work  brought  in  by  the  job  exceeds  w-u,  but 
is  less  Kl-  u  (otherwise  the  job’s  deadline  cannot  be  met).  Since  B(x)  is  the  PDF  of  jobs  arriving  at 
the  lower  queue,  the  probability  that  an  upcrossing  occurs  is,  [B(/C  1  -  u)  -  B(w  -  u)l/(u)du.  Similarly, 
for  a  job  arriving  in  the  upper  queue  the  probability  that  an  upcrossing  occurs  from  the  level  u  is, 
[G(/C2  -  u)  -  G(w  -  u)lf(u)du.  Therefore,  the  total  rate  of  upcrossing  is  given  by, 

rate  of  upcrossings  due  to  an  arrival  =  Xi  j  {B(Kl  —  u)  —  B(w  —  u)}  d^F{u) 

Jo 

+  A:  {G{K2  -  u)  -  G{w  -  u)}d^F{u) 

Jo 

Equating  the  rate  of  down  crossing  with  that  of  upcrossing  yeilds  Equation  11. 

In  the  second  region,  K2  <  w  <  Kl,  an  uprcossing  from  any  level  u(<  w)  will  occur  only  when  a 
job  arrives  at  the  lower  queue.  Hence, 

rate  of  upcrossings  due  to  an  arrival  =  Aj  /  {B{K  I  ~  u)  —  B{w  -  u)}  duF{u) 

Jo 

Equation  12  is  obtained  by  equating  the  rate  of  upcrossing  and  downcrossing. 


Appendix  B  :  Solution  of  Equation  3  and  11  and  12 
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Tasks  with  Boiinded  Queueing  Time: 

For  the  purpose  of  cl&rity  let  Fi(u;)  denote  the  function  F{w)  in  the  region  0  <  u;  <  K2  and  Fj(tu) 
denote  the  function  /’(u;)  in  the  region  K2  <  w  <  K 1.  Consider  equation  2  in  the  region  0  <  <  K2-. 

=  (-^1  +  ~  -  Fi(u;)} 

Define  /i{w)  =  Let  Fj*  denote  the  Laplace  Transform  of  /i(iu).  Taking  the  Laplace  transform 

and  rearranging  the  terms  we  get  (10|, 


f;(s)  = 


^F,(0+) 


s  +  (Ai  +  A})(B*(s)  -  1) 


(18) 


Assuming  service  times  are  exponentially  distributed,  we  have  B'{a)  =  +  s).  Thus, 


F{(s)  = 


5Ft(0+) 


s  +  (Ai  +  Ai)(/i/(/i  +  a)  -  1) 


On  taking  the  inverse  Laplace  transform,  we  obtain  an  expression  for  /i(tu).  Then  Fi{w)  is  simply 
given  by: 

Jo 

The  solution  to  equation  2  in  the  region  K2  <  w  <  Kl  ia  obtained  in  a  similar  manner.  We  can 
rewrite  the  expression  as: 


=  (Ai  +  A,)|^'^*fl(u,-u)d„Fi(u)-F,(fr2)| 

+Ai  I B(u;  -  u)d„<iF2(o)  -  d^FjCu)  | 

Define  g{w)  =  I/(iu  —  k2)fj{w),  where  /i(tw)  is  the  density  function  in  the  region  defined  by,  K2  < 
w  <  Kl  and  U{w  —  K2)  is  a  step  function.  Let  C’(s)  be  the  Laplace  transform  of  j(iu).  Then  for 
B*(a)  =  m/(m+  «)  we  get. 


where 


s  +  -  X\ 
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On  inverting  the  above  transform  we  obtain  an  expression  for  in  the  region  w  >  k2.  The  desired 

result  for  fj(u;)  can  now  be  easily  derived  since 


rKi  rvi 

F2{w)=  /  fi{u)du+  /  /j(u)du 
Jo  JK2 


Note  that  although  we  have  derived  expressions  for  an  exponentially  distributed  service  time,  the 
above  technique  can  be  used  to  solve  a  general  class  of  service  time  distribution. 

Tasks  with  Bounded  Waiting  Time: 

Separating  the  value  of  F{w)  at  zero  (denoted  by  F{0'*’)),  Equations  11  and  12  can  be  rewritten  as: 


f{w)  =  At{fl(in)- B(u»)}f(0^)  +  A5{C?(A'2)-G(u;)}F(0"-) 

+  Ai  r  {B{Ki  -  u)  -  B{w  -  u)}  /(u)du 

Jo* 

+  Aj)  f  {G{K2  —  u)  —  G(u;  -  u)}  f(u)du  0  <  w  <  K2 
Jo* 

f(w)  =  A,{a(A:i)- 

+  r  {B{Kl  -  u)  -  B(u;  -  u)}  d„E(u)  K2  <  w  <  Kl 

Jo* 


Similarly,  equation  13  takes  the  form, 


^(0^)  +  = 


Hence,  with  the  knowledge  of  the  values  of  the  function  in  the  interval  (0, /fl|,  the  two  unknown 
F{0*)  k  A2  can  be  computed  from  the  equations  13  and  17. 

The  explicit  form  of  the  function  G{x)  was  not  used  due  to  the  poor  convergence  obtained  while 
solving  for  f’(0'^)4£Aj.  A  second  level  of  iterations  was  introduced  to  compute  G{x).  Our  algorithm 
was  as  follows: 


1.  Initialise  C(z)  =  B{x)  0  <  x  <  Kl 

2.  WHILE  NOT  DONE 

•  For  the  given  value  of  G(z),  compute  the  value  of  F[x)  for  z  6  [0,  ffl],  using  the  method  of 
subsitution  to  solve  the  integral  equations  11  and  12. 

•  Determine  F{0^)kXj  (using  IMSL  routines)  by  solving  equations  13  and  17. 

•  Compute  the  fraction  of  jobs  lost  using  equation  16. 
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.V 


•  Compute  the  new  G{x)  from  equation  14. 

3.  DONE  =  TRUE  when  three  successive  iterations  give  losses  within 


,  1%  of  each  other. 
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AUGMENTED  CHAIN  ANALYSIS  OF  MARKOV  AND  SEMI-MARKOV  PROCESSES  ‘ 


STEPHEN  G.  STRICKLAND  and  CHRISTOS  G.  CASSANDRAS 
Depaiiment  of  Electrical  aad  Compoter  Engineering 
Unhrerutjr  of  Maeeachneetta,  Amhent,  MA  01003 

ABSTRACT 

We  present  a  new  method  for  estimating  performance  sensitivities,  with  respect  to  parameters 
of  interest,  of  Markov  aad  (some)  semi-Markov  processes  from  information  contained  in  a 
single  nominal  sample  realisation.  Given  a  nominal  parameter  value,  we  use  a  perturbation 
of  the  parameter  to  define  a  perturbed  system.  We  then  construct  what  we  call  an  augmented 
chain  which  in  effect  allows  us  to  construct  perturbed  realisations  from  nominal  ones,  and 
hence  compute  sensitivities  of  any  quantities  measurable  on  these  realisations.  We  show  that 
the  "observability*  problem  encountered  in  our  earlier  work  can  be  overcome  through  an 
appropriate  transformation  of  the  augmented  chain  and  present  some  experimental  results. 

1  INTRODUCTION 

In  performance  optimisation  problems  involving  discrete  event  dynamic  systems,  analytical  expressions 
of  the  performance  measure  (in  terms  of  controllable  parameters)  geueraUy  do  not  exist.  Gradient-based 
optimisation  methods  typically  employed  in  snrii  cases  require  the  eensitivity  (or  partial  derivative)  of 
the  performance  with  respect  to  the  parameter's)  over  which  the  optimisation  is  being  perfonned.  Tra¬ 
ditionally,  one  resorts  to  simulation  aad  uses  a  finite  difference  estimator  to  compute  these  sensitivities; 
this  requires  two  simulation  mas  for  each  parameter.  Recently,  several  new  approaches  have  been  de¬ 
veloped,  which  extract  information  from  an  observed  nominal  state  trajectory  of  the  system  in  question, 
aad  directly  estimate  the  searitivity  of  the  performance  measure  with  respect  to  a  parameter  of  inter¬ 
est.  These  sample  path  based  techniques  include  the  Likelihood -Ratio  Method  |8],|3]  and  Perturbation 
Analysis  (d|,(S|-(7].  In  general,  these  methods  realise  considerable  computational  savings  as  compared 
to  the  two-simaJation  approach  since  thuy  require  only  a  single  simulation  run.  Mote  importantly,  since 
they  involve  only  observed  data,  they  may  also  be  used  in  on-line  or  real-time  control  schemes. 

This  paper  considers  new  sample  path  based  method  which  has  significant  advantages  in  many 
sitaatioas.  One  advantage  of  this  approach  is  that  nominal  system  parameters  need  not  be  known.  Fur¬ 
ther,  the  method  can  be  applied  to  discrete  (integer-valued)  parameters  (e.g.  buffer  capacities,  routing 
thresholds,  customer  class  sises)  for  which  the  performance  measures  are  necessarily  discontinuous. 

The  method  presumes  the  existence  of  two  Markov  chains  whose  structure  is  known.  We  assume 
both  represent  the  same  underlying  system  of  interest,  but  differ  in  the  value  of  some  parameter;  thus, 
we  refer  to  them  as  the  nominat  and  pertaried  chains.  Observation  of  a  sample  realisation  of  the  nominal 
system  allows  direct  measurement  of  its  sample  performance.  If  we  can  at  the  same  time  estimate  the 
sample  performance  for  a  perturbed  realisation,  then  we  can  immediately  compute  a  finite  difference 
sensitivity  estimate.  We  accomplish  this  by  using  the  event-driven  nature  of  the  underlying  system  to 
construct  an  augmented  chaen  related  to  the  nominal  and  perturbed  in  the  following  two  ways: 

(1)  The  augmented  chain  is  etoekaeUeaUf  eimitaT  to  the  perturbed  chain  in  that  the  stationary  state 
probabilities  of  the  perturbed  chmn  are  obtmnable  as  the  probabilities  of  appropriately  defined  aggregate 
states  in  the  augmented  chain.  (2)  The  augmented  chain  is  observable  with  respect  to  the  nominal  citain 
in  that  we  can  estimate  the  augmented  chain  state  probabilities  using  information  contained  in  a  single 
observed  nominal  realisation. 

In  previous  work  (l]  we  developed  a  method  for  constructing  an  augmented  chain  which  was  al¬ 
ways  stocliastically  similar  to  both  the  nominal  and  the  perturbed  cliains;  however,  it  was  not  always 
observable  with  respect  to  the  nominal  chain.  We  proposed  a  solution  to  this  problem  wliicii  involved 
generating  additional  "artificial”  events  to  supplement  those  observed  in  the  nominal  sample  path.  This 
requires,  however,  that  we  know  or  estimate  the  rate  parameter  associated  with  these  events.  In  thb 
paper,  using  a  more  direct  construction,  we  formalise  the  notion  of  observability  and  show  th.-it  under 
some  jencra!  conditions  an  obtervable  augmented  chain  cs’t  always  be  cci'.structcd  tlirough  a  transfor¬ 
mation  of  the  initial  augmented  chain.  We  consider  extensions  to  semi-Markov  processes  and  present 
some  experimental  results. 

'This  work  is  supportsd  in  part  by  the  National  Science  Foundation  under  frant  CCS-SSOtGTS  and  by  the  Rome  Air 
Development  Center  under  contract  F30>602-8t-C-0169 
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2  DIRECT  SYNTHESIS  OF  AUGMENTED  CHAINS 


In  this  section  we  extend  oar  prerioas  rcsnlts  |l|  by  imposing  an  event  etmctare  on  the  Markov  chains 
under  consideratioa.  This  allows  os  to  obtain  an  anfmented  chain  directly,  as  well  as  accomodate  a 
more  general  class  of  Markov  chains.  In  addition,  it  reaolts  in  a  more  compact  representation. 

We  assume  that  the  Markov  chains  considered  meet  the  following  conditions: 

(Al)  There  is  a  finite  set  of  events;  each  transition  defined  in  a  chain  is  associated  with  a  unique  event. 
Moreover,  all  transitions  associated  with  a  given  event  have  the  same  rate  (Le.  the  transition  rate 
is  a  function  of  the  event  type  alone). 

(A2)  For  each  state,  there  is  at  most  one  outgoing  transition  corresponding  to  each  event. 

Remark  :  We  can  accomodate  state  dependent  transition  rates  by  expanding  the  number  of  event  types. 


2.1  Definitions  and  Notation 

Consider  a  discrete  event  dynamic  system,  represented  by  {S,E,D),  where  5  is  a  state  space,  £  is  a 
set  of  events  which  cause  all  possible  state  transitions,  and  D  is  a  transition  function,  D  :  S  x  E  —  S. 

Let  E^{a)  denote  the  set  of  events  which  can  occur  when  the  system  is  in  state  s;  we  will  refer  to 
this  as  the  feasible  eet  of  events  at  state  s.  Then,  given  s  €  5  and  some  e&E,yte  define  Z7(s,  e)  as 

,  f  destination  state  when  s  occurs  in  state  s  UeSE^la) 


If  an  event  e  ^  E^(a)  occurs  at  state  s,  the  event  effectively  ignored  and  the  state  is  unchanged. 
This  is  to  be  distinguished  firom  the  case  of  a  *self-Ioop*  transition,  where  D{a,  e)  =  a  for  some  event 
e  €  E^{a),  Note  that  D{»,e)  also  defines  a  destination  matrix  describing  the  system. 

A  Markov  chain  a  is  obtained  from  the  discrete  event  system  definition  above  by  requiring  that  the 
events  of  each  type  constitute  Poisson 'processes,  and  by  providing  an  intensity  function  F  :  E  -*  R, 
cliaracterising  each  of  these  processes.  Thus,  we  may  write  a  >=  {5,  E,  D,  F). 

Nate  that  we  can  easily  obtain  the  infinitesimal  generator  of  a,  Q,  from  D  and  F  as 


^d(i7(s<,«),s/)F(e)  Si  ft  sy 

€ 

~  ^  ]  Q(*i»  •*)  Si  =  Sy 


(1) 


wliere  s,-,sy  €  S,  and  we  effectively  sum  the  rates  of  all  possible  transitions  from  s,-  to  sy  caused  by 
events  in  E^[si)  (6{-,  ■)  is  the  indicator  function:  f(x,y)  ae  1  if  x  »  y  and  0  otherwise). 

2.2  Construction  and  Properties  of  Reduced  Augmented  Chains 

Given  the  preceding  notation,  consider  two  Markov  chains  ai  —  andaa  =  {53,  £3,  Dj,  F^}. 

For  convenience,  we  make  the  following  additional  assumption: 


(AS)  tti  and  03  are  finite,  irreducible,  and  ergodk  chains;  hence,  they  have  unique  stationary  state 
probability  vectors  iri,X3,  determined  by  Q«-ir,-  »  0,  t  =  1,2. 


We  then  define  the  Reduced  Augmented  Chain  (RAC)  (the  term  "reduced*  is  used  to  distinguisli  this 
augmented  chain  from  tlie  maximal  augmented  chain  (MAC)  defined  in  |l|)  corresponding  to  ai  and 
03,  as  a  Markov  chain 

<sr  =  {Sr,  Er,  Da,  Fa) 

where 

Sr  =  5i  X  53 

En  =  El  u  £3 
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and  Dn  u  defined  for  each  element  (a,-, «/)  €  Sr  «dth  Si  €  St  and  «y  €  ^  by 


and  finally,  Fr  is  given  by 


’  (Viisi. e).  Z?a(ay. e))  if  Dt {sf,  e)  ft  0,  c)fi0 

[Diit, e), «y)  if  e)  ft  0,  e)  =  0 

(-«. e))  if  Dt{»u c)  »  0,  /7,(sy. <)ft0 

0  if  “0i  ^3(«y.<)  *0 

UceEi.e^Et 


It  u  shown  in  |2]  that  this  definition  results  in  a  RAC  identical  to  that  obtained  in  our  earlier  work 
|1).  A  key  feature  of  is  that  it  generally  contains  transient  states,  whicli  con  be  removed  since  we 
are  interested  in  the  stationary  state  probabilities.  A  simple  algorithm  for  constructing  directly  the 
iiredncible,  ergodic  sub-chain  of  the  RAC  (i.e.  the  RAC  with  transient  states  removed),  given  on  initial 
state,  is  also  given  in  (2).  Hereafter,  when  referring  to  the  RAC,  we  will  assume  that  all  transient  states 
have  been  removed;  thus,  aR  refers  to  the  irreducible  set  of  states  containing  a  given  initial  state. 

An  example  of  a  RAC  construction  is  shown  in  Figure  1,  where  we  show  the  RAC  which  results 
from  a  nominal  chain  representing  a  homogeneous  M/M/l/1  queueing  system,  with  transition  rates  A 
(for  arrival  events,  a)  and  ft  (for  departure  events,  d),  and  a  perturbed  chain  representing  on  M/M/1/2 
syaUm  with  the  same  traiuition  rates.  The  elements  5,  E,  D,  and  F  of  our  representation  ore  shown 
along  with  the  emresponding  state  diagrams.  Note  that  the  construction  includes  two  transient  states, 
which  ore  ignored. 

s,}.  E,»{s.  d},  F,(s)=X,  F,(d)=ii 


SjKio.  *1.  %).Ei"(*.dl,  f^Cs)«A, 

r-oi 

D2-  *l  « 


w  :a 


Sr»{%0  .  *« .  %  I.  %2  ^  FrU)=A,  FR(d)=p 

■*ii  0 

_  *12  bo 

*12  bo 
.b2  bi. 


Figure  1:  RAC  For  The  M/M/l/1,  M/M/l/Z  5ys(em  Poi’r 
In  order  to  provide  a  more  compact  representation  for  Dr,  let  us  define 


whicli  allows  us  to  rewrite  (2)  os 
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3.2.1  The  Stochastic  Similarity  Property 

As  previously  mentioned,  our  motivation  for  constructing  an  is  that  it  contuns  information  about  the 
joint  behavior  of  and  03.  In  fact,  if  is  appropriately  partitioned,  an  can  be  transformed  into 
a  Markov  chain  with  the  same  stationary  state  probability  vector  as  either  oi  or  03.  We  refer  to  this 
property  as  ttockaatic  timiUuitf. 

Definition :  Let  a  —  {S,  E,  D,  and  oo  =  {^0,  Eo,  Do,  i’o}  be  two  Markov  chains,  with  stationary 
state  probability  vectors  s  and  Sq  respectively,  and  dim(5o)siV.  Tlie  Markov  chain  a  is  said  to  be 
MtochoatUaUy  timUar  to  oo  with  respect  to  P  iff  there  exists  a  partition  P  =  of  S  such  that 

x(p,-)  =  )ro(s,-)  for  all  s,-  €  5o 

Clearly,  the  partition  P  results  in  a  Markov  chain  ap.  Letting  Qp  and  Qo  denote  the  infinitesimal 
generators  of  ap  and  oo,  an  alternative  definition  is  to  require  that  Qp  =  Qo- 

Remark  :  Stochastic  similarity  implies  that  the  transitions  between  the  partition  sets  p,-  €  P  (as 
applied  to  So)  constitute  a  realisation  of  oo.  Thus  we  can  explicitly  extract  a  rerdisation  of  oq  from  the 
observed  realisation  of  a  by  simply  ignoring  all  transitions  internal  to  any  p,-. 

Given  the  definition  of  stochastic  similarity  above,  we  now  wish  to  establish  the  fact  that  the  RAC, 
an,  defined  above,  is  indeed  stochastically  similar  to  both  eti  and  03.  Thus,  we  will  show  that  elements 
of  Sn  can  be  aggregated  in  ways  that  allow  us  to  express  the  stationary  state  probabilities  of  ai  and 
as  in  terms  of  such  ‘aggregate*  or  ‘composite*  states.  The  following  lemma  identifies  the  fundamental 
property  of  an  which  makes  thb  possible. 

Lemma  1:  Let  a  =  (S,  E,  D,  P}  and  oq  =  {5q,  Eo,  Do,  Po}  be  two  Markov  cliains,  with  infinitesimal 
generators  Q  and  Qo  respectively,  and  <lim(5o)=iV.  If  there  exists  a  partition  of  S  given  by 

. ff) 

such  that  for  all  a  €  p,-  and  j  ft  t, 

=  (7) 

where  s,-,  sy  €  So,  then  a  is  stochastically  similar  to  Oq  respect  to  P. 

Proof :  (see  |2j) 

We  shall  now  use  this  lemma  to  construct  appropriate  partitions  of  the  RAC  defined  above,  and 
establish  stochastic  similarity  properties  with  at  and  a3.  First,  as  in  our  earlier  work  (l|,  we  define  the 
following  partitions  of  an: 

Pr  ~  ^r,' :  (sfc,  sy)  €  r;  iff  Sfc  =  s,"  S  Si,  Sj  6  ^2  J  (8) 

Pc  =  {«.• :  (»fc,  »y)  €  e,-  iff  «y  =  s,  €  5,,  si,  €  5t  |  (9) 

The  definition  of  Sn  *»  the  cartesian  product  Si  x  S2  gives  it  a  rectangular  structure  with  the  ‘rows* 
associated  with  elements  of  Si  and  the  ‘columns*  associated  with  the  elements  of  5^.  The  partitions 

above  simply  formalise  this  fact.  Thus,  we  will  refer  to  r,-  €  Pr  as  the  ‘t***  row*  of  an  and  c.-  €  Pc  as 

the  ‘i***  column”  of  an- 

In  the  following  result,  we  show  that  the  partitions  Pr  and  p.  allow  ns  to  establish  that  ap  is 
stochastically  similar  to  ai  and  03,  respectively. 

Theorem  1:  Let  Sp  be  partitioned  through  Pr  and  P«.  Then,  ap  is  stocliastically  similar  to  ai 
with  respect  to  Pr,  and  to  03  with  resp^  to  P^,  i.e. 

—  «■!(«<)  foralls,  e5i  (10) 

forallsy€S2  (11) 

Proof  :  (using  Lemma  1;  see  |2|). 

The  key  implication  is  that  a  single  realisation  of  ap  provides  the  same  infonnation  as  two  distinct 
realisations,  one  of  ai  and  the  other  of  03.  Thus,  we  can  use  such  a  realisation  to  obtain  infonnation 
regarding  the  behavior  of  both  ai  and  03.  Suppose  that  ai  represents  a  Markov  chain  model  of  a 
nominal  system,  and  let  03  be  identical  to  ai  except  for  some  specified  parameter  perturbation.  Thus, 
03  corresponds  to  a  perlorbed  system.  In  this  context,  a  realisation  of  an  provides  sufficent  infonnation 
to  estimate  performance  sensitivities  of  the  nominal  system  with  respect  to  the  perturbed  parameter. 
Neither  ai  nor  03  need  be  observed. 
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2.2.2  The  Observability  Property 

As  already  mentioned,  the  stochastic  similarity  property  of  the  RAC,  ajt,  presented  in  Theorem  1, 
is  of  interest  if  sample  paths  of  an  can  be  conveniently  constructed,  given  sample  paths  of  at  (  the 
directly  observable  Markov  chain).  This  is  indeed  possible  nnder  an  obaervaltUitf  condition,  which  we 
shall  formalise  in  this  section. 

Consider  a  Markov  chain  Oq  =  {5o,  E,%,  Dq,  Po}  and  let  sq  £  be  a  specified  iniiuxl  ttaU.  Then, 
for  any  event  sequence  «  =  {efl,etf-}i  vfith  ti  ^  Eo  for  all  t,  there  is  a  corresponding  state  sequence 
s  =  {so.si,...},  with  Si  €  for  all  t.  Thus,  (so,s)  describes  a  stochastic  realisation  of  Oo  in  terms 
of  the  sequence  of  states  visited.  Note  that  Si.t  »  1,2,...  represenU  the  state  of  the  system  j%st  afUr 
event  ev  occurs.  In  case  ^  {a),  we  have  »i  —  Si_i  and  no  actual  transition  tsJces  place. 

Now  suppose  we  consider  a  second  Markov  chain  a  s  {S,E,DtF},  and  specify  an  initial  state 
ti)  gS.  Given  the  tame  event  sequence  e,  we  may  generate  a  state  sequence  t  s  {to,  ti, . . .}  with  ti  €  5 
for  all  i.  We  define  the  notion  of  observability  in  terms  of  the  relationship  between  P^(s{),  the  feasible 
set  of  events  causing  transitions  when  oto  is  in  state  s,-,  and  £^(t,‘),  the  corresponding  feasible  set  of 
events  for  t,-  £  5. 

Definition  :  Let  oo  =  {So,  £o,  Dq,  Fq}  and  a  =  {S,  E,  D,  F}  be  two  Markov  chains  with  specified 
initial  sUtes  so  €  5„  and  to  €  5  respectively.  Let  e  =  {«,•},  i  =  0, 1,...  be  any  event  sequence  with 
Ci  £  Et  for  all  t,  and  s  =  {s,},  t  =  {t,-}  the  corresponding  state  sequences  for  ao,a.  Then, 

1.  An  event  e  &  E  (or  the  corresponding  state  transition  firom  t,*  to  t,-4.i)  u  said  to  be  obaervable  with 
respect  to  oo  iff 

e£f:^(t.)=s-e€£?^(s,)  and  P(e)  =  Po(s)  (12) 

for  any  t  =  0, 1, . . . 

2.  The  chain  a  Is  said  to  be  obaervable  with  respect  to  oo  iff  s  €  P‘^(t|)  is  observable  for  all  t  0, 1, . . . 
and  all  event  sequences  e 

Remark  :  If  a  is  stochastically  similar  to  oco  with  respect  to  some  partition  P,  the  definition  is 
simplified  by  virtue  of  the  fact  that  P  constrains  the  sequence  (  in  temu  of  s.  Thus,  If  a  =  ajti  when 
Oo  u  in  state  s,-  £  Sq,  an  u  in  state  (sj,s/)  £  So.  (for  some  sy)  by  construction.  This  is  true  for  all 
event  sequences  <  used  to  generate  a.  Therefore,  in  view  of  (2)  and  (3),  the  observability  condition  for 
an  with  respect  to  the  nominal  chain  ai  becomes 

'/)!  ^  for  all  Si  £  5i,  [ti. sy)  £  5n  (13) 

or  equivalently, 

f?Rl(».i3y)iel /0*>>  f?i{ai,e)  ^4  0  for  all  <  €  Pn  (1^) 

In  the  example  of  Figure  1,  if  the  M/M/l/l  chain  is  observed,  then  RAC  state  sqi  has  an  unob¬ 
servable  transition  of  rate  ft  to  state  sno,  since  the  nominal  state  corresponding  to  soo  (so  €  Sq)  has  no 
feasible  n  transition  (no  departure  can  occur  when  the  system  u  empty).  Note  that  if  the  M/M/l/2 
chain  were  observed,  then  all  events/transitions  would  be  observable. 

The  definition  above  specifies  sufficient  conditions  for  constructing  a  realisation  of  a  from  an  ob¬ 
served  realisation  of  oo.  The  feasible  set  of  a  state  a  €  S,  along  with  P(e)  for  all  s  £  E^{3),  define 
the  parameter  of  the  holding  time  distribution  (which  is  necessarily  exponential)  of  the  state,  as  well 
as  the  distribution  of  the  type  of  the  next  event.  Thus  we  can  view  observability  as  a  condition  which 
guarantees  that  the  sequence  of  holding  times  and  events  which  are  observed  in  the  realisation  of  a| 
have  the  correct  distributions  to  be  used  in  constructing  a  realisation  of  a  (see  (2]). 

3  FULLY  OBSERVABLE  REDUCED  AUGMENTED  CHAINS 

As  stated  previously,  our  goal  is  the  construction  of  a  RAC,  a/i,  which  is  both  observable  with  respect  to 
aj  and  stochastically  similar  to  a^.  While  an,  as  defined  in  Section  2.2,  is  always  stochastically  similar 
to  o;,  it  will  often  not  be  observable  with  respect  to  at  due  to  the  constraining  nature  of  the  observability 
conditions.  In  this  section,  we  show  that,  subject  to  some  general  conditions,  we  can  transfonii  ap  to 
a  new  RAC,  a'^,  whicli  is  fully  observable  with  respect  to  at,  and  retains  a  modified  form  r'  stochastic 
similarity,  which  we  shall  call  (-stmifan'ty.  In  brief,  we  decompose  the  states  of  the  RAC  in  a  way  that 
allows  us  to  both  eliminate  the  unobservable  transitions  and  express  the  perturbed  state  probabilities 
in  terms  of  aggregate  states  created  by  the  decomposition. 
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3.1  ^-Similarity 

We  begin  by  extending  the  definition  of  etochutic  aimilarity  given  in  section  2.2.1. 

Definition  :  Let  a  =  (S,  E,  D,  F]  end  oo  =  {^o>  Eo,  Do,  Fo)  be  two  Markov  chains  with  stationary 
state  probability  vectors  ir  and  xo  respectively,  and  dim(5o)=lV^.  The  Markov  chain  a  is  sxud  to  be  (• 
simitar  to  oo  with  respect  to  K  =  iff  there  exists  a  set  K  C  5  and  a  constant  (  €  (0,  l|  such 

that: 

x(V^)  =  (xo(s,)  for  all  Si  €  5o 

The  next  Lemma  is  a  generalisation  of  Lemma  1,  and  establishes  conditions  under  whicli  two  cliains 
are  ^-similar  .  In  what  follows,  given  a  state  space  S,  s  gS,  and  AC.  S,yre  shall  use  Q(s,  A)  to  denote 
Q(s,  t).  We  shall  also  denote  the  complement  of  A  with  respect  to  5  by  i4. 

Lemma  2:  Let  a  ^  (5,  E,  D,  f }  and  oq  =  {5a,  Eq,  Do,  Fq)  be  two  Markov  cliains  with  infinitesimal 
generators  Q  and  Qo  respectively,  and  dini(5b)=/^.  Let  P  be  a  partition  of  5  sucli  that 

P  =  y  u  w  =  u  {Will . s 

and  the  following  conditions  hold  for  all  t  =  1, ...,  Nt 

(Cl)  Q(s,  Wi)  =  Q{s,  Vi),  for  aU  s  e  W., 

(C2)  <3(s.  Vi)  +  Q(s.  Wi)  =  Qoisk.si),  for  aU  s  €  V^.  Ifc  i 
(C3)  Wi  is  not  an  absorbing  aggregate  state 

Then,  a  is  ^-similar  to  oo  with  respect  to  V,  Le. 


*^Ki)  =  f*b(j.) 

and  (  is  given  by 

e  =  ir«(7)  =  l-x„{W) 

Proof  :  (by  flow  balancing  around  each  Vi  and  W,-;  see  |2l). 

Remark  :  Lemma  1  may  be  viewed  as  a  special  case  of  Lemma  2,  where  K  =  S  and  V,  =  p,. 
In  this  case  C  =  1  ^<1  condition  (C2)  reduces  to  (7).  Note  that  no  assumptions  are  made  regarding 
transitions  originating  in  each  W,-,  except  that  any  terminal  states  lyirig  outside  W,-  must  belong  to  the 
corresponding  Vi. 

In  what  follows,  we  will  decompose  5r,  the  state  space  of  aa  defined  in  section  2.2,  so  as  to  define 
a  partition  satisfying  the  conditions  of  Lemma  2,  and  also  where  all  unobservable  transitions  originate 
within  a  W,-  set..  Using  this  partition,  we  define  a  transformation  of  a/t  yielding  a  new  RAC,  a'jf,  wliich 
is  observable  with  respect  to  ai  (nominal  chain)  and  ('similar  to  (perturbed  cliain). 


3.2  Decomposing  the  RAC:  Active  and  Passive  States 

Witliout  loss  of  generality,  we  assume  that  ai  is  the  nominal  chain  (recall  that  we  denote  the  nominal 
and  perturbed  chains,  and  the  RAC  by  01,03,071,  respectively).  Let  C/,-  denote  the  set  of  RAC  states  in 
the  composite  state  c,  (equivalently,  the  i*^  column)  defined  in  (9)  which  emit  unobservable  transitions, 
and  let  P,-  =  e,-  —  V;. 

Let  us  attach  a  binary  indicator  to  the  state  of  a/t  which  takes  on  two  values:  active  and  passive  and 
is  defined  as  follows.  We  assume  the  state  is  initially  active  at  the  start  of  the  sample  path.  Entering  any 
Ui  while  the  state  is  active  causes  a  switch  to  the  passive  state.  The  indicator  remains  passive  until  the 
system  enters  the  P,  corresponding  to  the  Ui  which  initiated  the  passive  state.  At  this  point  it  returns 
to  the  active  state.  Note  that  this  indicator  has  no  effect  on  the  evolution  of  the  original  state  vector. 
Further,  entering  any  C/,  while  in  the  passive  state  has  no  effect  on  the  binary  indicator. 

The  RAC  states  s  G  S/t  are  thus  decomposed  into  distinct  component  states  (*«,  s",  a* ,  sjl, . . .} 
defined  by  s,.  =  s,  given  the  system  is  in  the  active  state,  and  j|,  =  a,  given  the  system  is  in  the  passive 
state  as  a  result  of  entering  U,. 

Based  on  this  decomposition,  let  us  define  an  aggregate  state  W,-  as  the  set  of  all  t''*  pas.«ive 
components  of  states  s  e  5,7,  i.e. 


W,  =  (J  sj.  fori  =  1,2,... 
s  €  Sff 
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.RiRiRiR^Rj  Ui...  RilURicRi .  •  -  ...  ViV^ViV^Vi  K  Vj  V).  Vy . 


Figure  2:  Tke  RAC  State  Trajectory  Decompoaiiion 


Figure  3:  Ae(t«e  and  Pataive  State  Space  Deeompoaition 

In  effect,  tUe  RAC  enters  tV,-  when  it  enters  Ui  while  in  the  ective  state,  and  retnaiiis  in  Wi  until  the  next 
visit  to  Ri.  Thus,  the  RAC  state  trajectory  can  be  viewed  as  a  sequence  of  active  segments  connected 
by  visits  to  a  single  Wi  (Figure  2).  Let  us  also  use  Vi  to  denote  the  set  of  all  active  components  of  states 
s  €  Ri,  and  let  V  <=  U,-V{.  This  dehnes  the  most  important  decomposition  of  Sr — into  active  and  passive 
states  (see  Figure  3) — as 

5«  =  (Js.  u  IJ4  =1JK  U  ljWi  =  vvW  (15) 

•  «*#  «  • 

The  next  result  establishes  the  fact  thsit  this  decomposition  of  Sr  satisfies  the  conditions  of  Lemma 
2,  and,  therefore,  allows  ag,  to  be  (•similar  to  with  respect  to  the  set  of  active  states,  V, 

Lemma  S:  Given  the  partition  V UW  defined  by  (IS),  if  V{  ^  0  for  all  t,  then  (xr  is  (-similar  to 
with  respect  to  V,  Le. 

xjl(V;)  «=  (x3(s4.  with  (  =  irrt(V)  =  1  -  rgiW) 

Proof :  (using  I  *mma  2;  see  (2)}. 

Remark  :  Lenun-.  3  establishes  (-stmtlartly  between  an  and  Oj  with  respect  to  the  set  of  active 
states  V.  The  portions  of  the  nominal  trajectory  during  which  the  system  occupies  active  states  (elements 
of  V)  effectively  constitute  a  realisation  of  the  perturbed  system  (this  is  similar  to  the  *cut-and-paste” 
idea  of  (7|).  Thus  we  use  observations  made  during  these  portions  to  estimate  the  perturbed  state 
probabilities.  The  fraction  of  the  total  observation  interval  which  these  portions  constitute  is  given  by 
(,  where  0  <  (  <  1.  For  the  fastest  possible  convergence  of  our  estimates,  we  clearly  want  (  as  close  to 
1  as  possible. 

Note  that,  by  construction,  all  unobservable  transitions  originate  in  W  (if  none  exist,  then  IV  =  0 
and  (  =3  l).  Thus,  our  next  objective  is  to  transform  an  so  as  u>  eliminate  all  unobservable  transitions 
without  violating  (Cl)-(C3). 


3.3  The  Observability  IVansformation 

In  this  section  we  present  our  main  result  which  states  that  under  certain  general  conditions,  there 
exists  a  transformation  of  ag  yielding  a  new  RAC,  a'g,  whicli  is  both  observable  with  respect  to  ai  and 
(-similar  to  aj. 

We  begin  by  defining  a  state  transition  transformation  for  a  Markov  chain  a  =  (5,  E,  D,  F).  Let 
denote  the  set  of  all  state  transitions  defined  in  a,  i.e. 

*  =  {((*.0.«I  **•.<€  5,  e€E,  D(a,e)=t,  F(e)  >  0} 

Then,  a  atale  tranaition  transformation  is  defined  to  be  a  mapping; 

r  ;  ♦  —  (5  X  5)  U  0 

where  the  mapping  to  0  corresponds  to  removal  of  a  transition  ((s,t),e|.  For  simplicity,  we  shall  limit 
ourselves  to  transforming  transitions  caused  by  a  single  event  between  cacli  p.iir  of  stales,  and  not 
affecting  the  transition  rate  of  this  event.  Thus,  we  denote  this  transforitiation  by 

r(s,<)  =  (u,  u)  or  7'(j,£)=0 
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where  (u,t>]  €  (5  x  5),  aad  the  usociated  event  is  implied. 

Now,  we  seek  s  transformsiion  T*  which,  when  applied  to  all  state  transitions  of  a/t,  generates  a 
RAC  olxservable  with  respect  to  at  and  also  ^-similar  to  a^. 

Theorem  2:  Let  be  the  set  of  state  transitions  in  an,  and  T“  a  transformation  applied  to 
(s,  t)  €  4n  "och  that 

T*(s,t)  =  0  if  (s,t)  is  unobservable 

=  (s,  w),  «  €  (Ry  n  Vi)  if  s€lVi,t€ry,  t,y=l,..., /V 


If  Vi  0  for  all  t,  tlie  resulting  RAC,  aj^,  is  observable  with  respect  to  at  and  ^-similar  to  oj  with 
respect  to  V. 

Proof  :  (see  |2|). 

In  Figure  4  we  sitow  the  application  of  this  transformation  to  the  example  of  Figure  1. 


Figure  4:  Observeiility  Transformation  of  the  M/M/l/l,  M/M/l/2  RAC 


Remark  :  Note  that  all  but  one  of  the  conditions  required  are  met  either  directly  or  indirectly  by 
the  construction  of  an*  The  only  "real*  condition  is  that  Vi  ^  0,  for  all*.  This  is  equivalent  to  requiring 
that  for  each  state  s  €.  S-i,  there  exist  at  least  one  state  t  €  St  such  that  E^(s)  C  E^{t).  Also,  note  that 
simply  removing  the  unobservable  transitions  is  generally  not  sufficient  since  (as  in  our  example — see 
Figure  4)  it  may  make  one  or  more  W{  sets  absorbing. 


4  Extension  To  Semi-Markov  Processes 

We  have  investigated  extensions  of  our  method  to  semi-Markov  processes  via  two  directions.  The  first 
utilises  a  discrete-time  Markov  chain  imbedded  in  a  continuous-time  semi-Markov  process.  This  only 
requires  extending  our  existing  methodology  to  discrete-time  Markov  chains.  The  second  approacli 
involves  applying  our  continuous-time  approach  directly,  simply  relaxing  the  requirement  that  the  event 
streanu  constitute  Poisson  processes.  For  certain  restricted  classes  of  semi-Markov  processes,  we  obtain 
results  equivalent  to  the  pure  Markovian  case. 

4.1  Imbedded  RACs 

The  imbedded  Markov  chain  approacli  U  well  known.  We  need  only  extend  our  augmented  cliain  method 
to  discrete-time  Markov  cliains.  While  there  are  some  complications  due  to  the  differing  normalisation 
conditions,  this  can  be  done.  In  Section  S  we  apply  the  imbedded  cliain  approach  to  the  M/CI/l/K 
queueing  system.  We  note  in  passing  that  the  utility  of  the  imbedded  chain  approacli  rests  on  the 
ability  to  rel.ite  characteristics  of  the  imbedded  cliain  to  those  of  the  continuous-time  chain  in  which 
it  is  imbedded.  In  the  case  of  the  M/GI/l/K  system,  we  are  fortunate  in  that  the  stationary  state 
probabilities  of  the  (imbedded)  discrete-time  and  continuous-time  systems  are  identical.  This  is  generally 
not  the  case. 


4.2  Relaxation  of  the  Markovian  Assumption 

In  this  approach,  wliicli  we  apply  to  the  GI/M/l/K  system  in  Section  S,  we  simply  relax  the  Markovian 
requirement  for  one  or  more  of  the  event  processes  (allowing  them  to  be  .arbitrarily  distributed).  In  some 
cases  (e.g.  the  Gl/M/l/K  system)  the  efficacy  of  the  method  is  unaffected  by  this  relaxation.  While 
this  represents  ongoing  research,  the  basic  idea  can  be  outlined  as  follows. 

Consider  the  problem  of  constructing  a  perturbed  realisation  in  a  simulation  environment.  Given 
an  initial  state,  our  task  can  be  viewed  as  one  of  constructing  a  “stochastically  correct”  sequence  of  slate 
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hoIding-tiin«s  (ro,  ri, . . .),  with  «aaoci»ted  tcrmuiAting  events  (eoi «ii  •  •  •)■  Given  n  model  of  the  system, 
this  event  sequence  uniquely  determines  the  state  sequence.  At  each  iteration  t,  we  use  e,-  to  determine 
the  next  state,  and  repeat  the  process.  In  a  simulation  environment,  we  can  generate  a  (r,-,  e,  )  pair  which 
is  'stochastically  correct*  by  generating  an  exponentially  distributed  random  number  ta  for  each  feasible 
event  ea  (parameteiised  by  the  event  rate)  and  setting  n+i  «  mina{ta}  and  e,>i  »  the  associated  event. 
The  augmented  chain  essentially  lets  the  nominal  system  perform  this  operation.  If  the  current  nominal 
state  has  the  same  feasible  set  as  the  current  perturbed  state  (in  the  construction)  then  the  observed 
(r,-,  e,-)  pair  has  the  appropriate  stochastic  characteristics  required  by  the  perturbed  realisation.  This 
is  what  occurs  when  the  RAC  is  in  the  active  state.  If  the  nominal  feasible  set  does  not  matcli  that 
of  the  perturbed  state,  we  suspend  the  perturbed  construction  until  the  nominal  system  enters  a  state 
which  does  matcli  that  of  the  perturbed  state,  at  which  point  we  proceed  as  before.  Suspension  of 
the  construction  corresponds  to  the  RAC  entering  the  passive  state  via  some  Un  eiit<ring  a  nominal 
state  where  we  can  restart  the  construction  corresponds  to  the  RAC  re-entering  the  active  state  via  the 
corresponding  V;. 

In  a  non-Markovian  environment,  things  are  much  more  dif&cult  because  the  (r,-,  e,-)  statbtics  are 
not  functions  of  the  state  alone  but  also  of  the  elapsed  times  since  the  previous  occurrence  of  eadt 
non-Markovian  event.  In  some  cases,  however,  we  can  still  extract  nominal  (r{,  e,-)  pairs  with  the  correct 
statistics.  While  a  complete  discussion  is  beyond  the  scope  of  this  paper,  this  appears  to  be  possible 
only  when  we  have  no  more  than  one  non-Markovian  event  process  active  at  any  time,  when  each  1/, 
with  an  unobservable  non-Markovian  event  is  reachable  only  by  transitions  corresponding  to  that  event, 
and  where  eadi  corresponding  is  reachable  by  transitions  corresponding  to  the  same  event.  Sucli  a 
case  is  the  M/GI/l/K  system. 


5  Experimental  Results 

In  this  section  we  provide  experimental  results  for  three  variations  on  the  single  server  queueing  system: 
the  M/M/l/K,  M/GI/i/K,  and  GI/M/l/K  systems  (more  extensive  results  are  contained  in  [2]).  In 
each  case  considered  below,  2,  the  *GI”  distribution  H  deterministic,  and  the  utilisation  is  1  (Le. 
A  =s  ft).  The  perturbed  systems  represent  a  change  in  queue  capacity  of  +1.  The  performance  measures 
considered  are:  the  utilisation,  U,  the  mean  queue  length,  Nq,  and  the  mean  delay  (or  system  time), 
D.  We  applied  the  following  variations  of  our  augmented  chain  approach: 


1.  URAC/K  where  we  use  the  original,  unobservable  RAC  of  Section  2  and  handle  un¬ 

observable  transitions  by  generating  artificial  events,  when  required,  using  a  random 
number  generator  parameteiised  by  the  event  rate  which  we  assume  ts  known. 

2.  UBAC/E  which  is  identical  to  the  URAC/K  case  except  that  we  assume  the  event  rate 

is  unknown,  thus  we  estimate  it  using  observations  made  in  the  nominal  patli. 

S.  IRAC  where  we  use  an  imbedded  discrete-time  RAC. 


4.  TRAC  where  we  use  the  observability'  transformation  of  Section  4  to  obtain  a  fully 

observable  RAC. 

5.  SIM  which  a  a  straightforward  simulation  of  the  perturbed  chain. 


Sioeletiee 
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Figure  5:  M/M/l/B  Utilization  Estimates  Figure  6;  Gl/M/l/t  Mean-Q  ueue-Length  Estimates 
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Figure  7:  M/GI/1/2  Meaa-Queue-Lenglk  and  Delay  Estimates 

Wliile  «11  these  techniques  cun  be  applied  to  Markov  processes,  not  all  can  be  applied  to  the  same  class 
of  semi-Markov  processes  (see  (2|). 

For  the  M/M/l/2  system,  we  applied  the  URAC/K,  URAC/E,  TRAC,  and  SIM  methods.  In  Figure 
S  we  show  the  resulting  estimates  of  perturbed  utilisation  as  a  function  of  the  number  of  nominal  arrivals. 
Also  shown  are  the  corresponding  results  using  a  straightforward  simulation  of  the  perturbed  system. 
All  curves  are  the  average  of  10  runs.  Note  that  there  is  little  degradation  associated  with  estimating 
the  artificial  parameter  from  the  nominal  path  as  compared  to  a  priori  knowledge.  Also  note  that  the 
convergence  of  the  observability-transformation  method  is  slower  than  the  other  methods,  indicating 
that  (  <  1. 

We  applied  the  and  URAC/K,  URAC/E,  and  SIM  methods  to  the  GI/M/l/2  system  and  the 
IRAC,  TRAC,  and  SIM  methods  to  the  M/GI/1/2  S3rstem;  the  resulting  estimates  (averages  of  10  runs) 
of  perturbed  mean-queue-length  and  delay(for  the  M/GI  only)  for  these  two  systems  are  plotted  versus 
the  number  of  arrivals  in  Figures  S  and  7,  respectively.  The  convergence  of  all  methods  is  comparable. 

Acknowledgement:  The  authors  acknowledge  many  useful  discussions  with  Don  Towsley. 
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Abstract 

In  thia  paper  a  model  of  a  ahored  memory  multiproceaaor  that  executes  fork-joxn 
parallel  programa  oa  a  bulk  arrival  / M/c  queueing  ayatem  ia  developed.  Here  a  fork- 
join  job  ia  one  that  coiisiata  of  a  set  of  X  tasks.  All  of  the  tasks  arrive  simultaneously 
to  the  system  and  the  job  is  assumed  to  complete  when  the  last  task  completes.  We 
deveLp  tight  upper  and  lower  bounds  for  the  mean  response  time  of  such  programs 
when  the  scheduling  discipline  is  processor  sharing  under  the  assumpiions  of  exponenti.il 
t.isk  service  limes  anrl  .i  Poisson  job  .arrival  process.  We  stiiHv  two  piocessoi  sIi.timib 
policies,  one  <  ■■\lle<l  Pi.ot  schrAulint)  processor  sharing  and  the  ••ihei  c.illcd  ;i)/i  o  Acdu/in.; 
processor  sharing.  The  first  policy  schedules  tasks  independently  of  e.rcli  other  and 
allows  parallel  execution,  whereas  the  second  policy  schedules  entire  jobs  .as  a  unit  and 

This  work  w.as  supported  in  part  by  the  N.ation.al  Science  Foundation  under  grant  NfCS-810<2n3  and  Hv 
RADC  under  contract  RI-t^S90X  and  F J02602-dl-C-0169. 
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thereby  doea  not  allow  parallel  execution  of  an  individual  program.  W*  find  that  the 
job  scheduling  policy  exhibits  better  performance  than  task  scheduling  only  on  systems 
with  a  small  number  of  processors,  where  the  system  is  operating  at  high  loads  and 
is  executing  programs  that  can  sustain  a  large  degree  of  parallelism.  Consequently,  in 
general,  task  scheduling  outperforms  job  scheduling.  We  also  compare  the  performance 
of  the  processor  sharing  policy  with  first  come  first  serve.  We  find  that  first  come  first 
serve  exhibits  better  performance  over  a  wide  range  of  systems.  The  paper  also  studies 
the  performance  of  processor  sharing  and  first  come  first  serve  with  two  classes  of  jobs, 
and  when  a  specific  number  of  processors  is  statically  assigned  to  each  of  these  classes. 
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1  Introduction 


With  the  advent  of  multiproce33ors[Ost86]  and  programming  languages  that  support  parallel 
programming,  (e.g.,  Concurrent  Pascal  (Han75],  CSP[Hoa85l,  and  Ada  (PylSll)  there  is 
increasing  interest  in  modeling  the  performance  of  parallel  programs.  In  this  paper,  we 
evaluate  the  performance  of  a  particular  type  of  parallel  program,  a  fork-join  job,  on  a 
multiprocessor  consisting  of  identical  processors  when  the  service  discipline  is  processor 
sharing.  In  our  model  a  fork-join  job  is  composed  of  a  set  of  tasks  each  of  which  can  be 
scheduled  independently  of  the  others  at  any  processor.  All  tasks  in  a  given  job  arrive 
simultaneously  to  the  system.  The  job  completes  when  the  last  task  completes. 

The  performance  of  parallel  programs  such  as  fork-join  jobs  is  significantly  affected  by  the 
choice  of  policy  that  is  used  to  schedule  tasks.  We  analyze  the  performance  of  a  processor 
sharing  (PS)  policy  that  schedules  tasks  of  a  job  independently  of  each  other.  We  refer  to 
this  policy  as  task  scheduling  PS,  TS-PS.  We  compare  the  performance  of  this  TS-PS  policy 
to  that  of  a  second  PS  policy  that  schedules  entire  jobs  (as  a  single  unit)  independently 
of  each  other.  We  refer  to  this  policy  as  job  scheduling  PS,  JS-PS.  The  TS-PS  policy  is 
unaware  that  jobs  exist  whereas  the  JS-PS  policy  is  unaware  that  tasks  exist.  We  also 
compare  the  performance  of  TS-PS  and  JS-PS  to  the  first  come  first  serve  (FCFS)  policy. 
In  these  comparisons  we  consider  different  numbers  of  processors,  sizes  of  fork-join  jobs, 
multiple  classes,  and  dedicated  assignments  of  the  processors  of  the  multiprocessor  to  the 
different  classes. 

In  the  course  of  our  study,  we  develop  upper  and  lower  bounds  on  the  mean  fork-join  job 
response  times  under  TS-PS.  These  bounds  are  generally  very  tight  and  we  approximate 
the  mean  job  response  time  by  taking  the  average  of  the  two  bounds.  Analyses  of  the  other 
two  policies,  JS-PS  and  FCFS  have  already  appeared  in  the  literature  ([RTS87,NTT87]). 

We  make  the  following  observations  from  our  study. 

•  FCFS  provides  better  performance  than  TS-PS  or  JS-PS  for  a  wide  range  of  workloads 
and  number  of  processors.  It  appears  that  the  advantages  that  FCFS  has  over  PS  in 
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single  processor  systems  carries  over  to  multiprocessors  executing  parallel  programs. 
This  ceirries  the  implication  that  one  should  choose  large  quantum  sizes  for  round 
robin  policies  operating  on  multiprocessors. 

•  TS-PS  performs  better  thaun  JS-PS  most  of  the  time.  However,  if  the  number  of 
processors  is  small,  the  degree  of  parallelism  high,  and  the  processor  utilization  is 
high,  JS-PS  can  perform  better.  This  same  phenomenon  was  observed  on  single 
processors  in  an  earliei  study,  [RTS87j. 

•  It  may  be  useful  to  partition  the  processors  in  a  multiprocessor  into  separate  pools  to 
handle  different  classes  of  jobs  rather  than  having  the  jobs  share  the  processors.  We 
observe  that  jobs  requiring  the  least  amount  of  computation  can  benefit  from  such  a 
partition. 

In  the  remainder  of  this  section  we  briefly  review  earlier  work  and  outline  the  remainder 
of  this  paper.  Processor-sharing  has  been  addressed  in  the  literature  in  several  ways  since 
its  introduction  [Kle64j.  A  survey  of  processor-sharing  results  may  be  found  in  (Kle76]. 
An  exact  analysis  of  the  TS-PS  policy  operating  on  a  single  processor  was  performed  by 
Rommel,  et  al.  [RTS87t.  Unfortunately,  the  approach  used  in  that  paper  does  not  extend 
to  multiple  processors.  This  study  first  demonstrated  that  job  scheduling  can  give  better 
performance  than  task  scheduling.  In  addition,  there  is  a  growing  literature  on  fork-join 
queueing  systems  [BM85,BMT87,NT85].  Although  these  referenced  papers  analyze  fork- 
join  jobs,  their  emalysis  differs  from  that  studied  in  this  paper  in  that  processors  are  allocated 
to  specific  tasks  prior  to  execution.  We  are  interested  in  systems  where  processors  can  be 
dynamically  allocated  to  different  tasks. 

The  format  of  this  paper  is  as  follows.  We  describe  the  queueing  system  under  consideration 
in  Section  2.  Section  3  contains  expressions  for  the  upper  and  lower  bounds  on  the  mean 
response  time  for  the  TS-PS  scheduling  policy  along  with  an  approximate  analysis  of  that 
policy.  This  is  followed  by  our  numerical  results  in  Section  4.  Finally,  in  Section  5  we 
summarize  the  results  of  the  paper. 
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2  Model  Description 


We  consider  a  system  of  c  identical  processors  that  serve  a  single  queue.  Fork-join  jobs 
enter  the  system  according  to  a  Poisson  process  with  parameter  A.  A  fork-join  job  consists 
of  X  tasks  that  can  be  processed  independently  of  each  other  where  A  is  a  random  variable 
(r.v.)  with  probability  distribution  o^  =  P[X  =  tj,  i  =  The  service  time  required 

by  a  task  is  assumed  to  be  an  exponential  r.v.  with  parameter  /i  and  is  independent  of  the 
service  requirements  of  edi  other  tasks. 

We  are  interested  in  the  steady  state  behavior  of  this  system  when  operating  under  the 
task  scheduling  processor  sharing  (TS-PS)  and  the  job  scheduling  processor  sharing  (JS- 
PS)  policies.  As  described  in  section  1,  TS-PS  is  a  policy  that  performs  processor  sharing 
at  the  task  level  and  JS-PS  is  a  policy  that  performs  processor  sharing  at  the  job  level. 
Thus,  if  the  system  contains  two  jobs,  one  with  one  task,  the  other  with  three  tasks,  then 
TS-PS  provides  an  equal  amount  of  service  to  each  task  and  is  capable  of  utilizing  four 
processors.  In  this  same  example  JS-PS  sees  two  jobs,  one  whose  service  time  is  that  of  a 
single  task,  the  other  whose  service  time  is  the  sum  of  the  service  times  of  the  three  tasks. 
JS-PS  provides  equal  service  to  the  two  jobs  and  is  only  able  to  utilize  two  processors. 

In  both  cases,  we  focus  on  the  response  time  of  a  random  job,  i  e.,  the  interval  of  time 
measured  from  the  arrival  of  a  job  until  the  service  completion  of  the  last  task  associated 
with  that  job.  The  system  can  be  visualized  as  a  queue  for  tasks,  c  servers,  and  a  waiting 
area  for  tasks  that  have  completed  service  but  are  awaiting  the  completion  of  the  last 
task  associated  with  the  job  (Figure  1).  This  last  queue  is  sometimes  referred  to  as  the 
synchronization  queue.  We  denote  this  response  time  as  T. 


3  Analysis 

In  this  section  we  concern  ourselves  with  obtaining  the  mean  response  time  E{T]  under 
both  TS-PS  and  JS-PS.  We  consider  JS-PS  first  as  it  is  the  simplest  to  analyze. 
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3.1  The  JS-PS  policy 


Let  L  denote  the  number  of  jobs  in  the  system  under  JS-PS.  The  distribution  of  L  is 
identical  to  the  queue  length  distribution  of  an  MjMjc  system  with  arrival  rate  A  and 
average  service  time  E[X]lyL.  Consequently,  the  average  response  time,  E\T\,  is  ('Ail78j) 


E\T]  = 


n‘{E\X]/^) 


cc![u7c!  +  (1  -  u/c)  ^,^0  u"/n!l(l  -  u/c) 


-  e[x!/m. 


(1) 


where  u  =  XE\X\/tx.  E{T]  =  E[L\/X. 


3.2  The  TS-PS  policy 


To  analyze  the  TS-PS  policy,  consider  the  delay  that  a  randomly  selected  job  incurs.  Let 
J  denote  this  job.  Let  N  be  a  r.v.  that  denotes  the  number  of  tasks  in  the  system 
at  the  time  that  J  arrives.  Let  =  P\N  =  n],  n  =  0, 1,**  -  denote  the  stationary 
distribution  of  N.  Let  denote  the  mean  response  time  of  J  conditioned  on  the  event 
that  J  consists  of  j  tasks  and  that  the  system  contains  ;V  =  n  tasks  al  the  time  of  its 
arrival,  i.e.  „  =  E\T\X  =  i,  N  =  nj.  We  can  write  the  following  expression  for  the  mean 
job  response  time, 

£:1T|X  =  .-1  =  ,•=1,...  (2) 

n=0 

Removal  of  conditioning  on  the  number  of  tasks  in  J  yields 

Eiri  =  f;a,£;ir|x  =  .i.  (3) 

»=i 

As  described  above,  the  number  of  tasks  in  the  system  is  described  by  a  Markov  process. 
Fortunately,  the  behavior  of  this  Markov  process  is  independent  of  the  policy  used  to  sched¬ 
ule  tasks  so  long  as  the  policy  does  not  schedule  jobs  based  on  service  time  information. 
Consequently,  the  distribution  of  N  is  identical  to  that  for  a  bulk  arrival  jMjc  system 
that  schedules  tasks  in  a  FCFS  manner.  Expressions  for  the  queue  length  distribution  for 
this  system  can  be  found  in  earlier  papers  (CT83,Yao85,NTT87j  and  are  omitted  here. 


Figure  2:  State  diagram  for  the  exact  system  when  jobs  consist  of  2  tasks. 

We  focus  on  the  conditional  expectations  We  define  a  Markov  Chain  with  state  {It,  Mt) 
with  infinitesimal  generator  Q  where  1%  is  the  number  of  tasks  remaining  in  J  at  time  t 
after  J  is  introduced  at  time  0,  Mt  is  the  number  of  tasks  in  the  system  at  time  t  that  are 
not  part  of  J ,  and  Q  =  where 


1  =  1  -  1,  n  =  m, 

1  =  i,  m  =  n  -  1, 

fit 

t  =  1,  m  >  n, 

-{X  +  fii+n). 

i  =  /,  m  —  n. 

0 

,  otherwise 

(4) 


where 


Mt  = 


I 


kfi,  k  =  1,  •  •  • ,  c 
c/i,  k  =  c  +  I,  -  •  ■ . 


The  resulting  chain  is  transient.  Figure  2  illustrates  the  associated  state  diagram  when  all 
jobs  consist  of  exactly  2  tasks. 
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It  follows  from  the  definition  of  Q  that  t,.„  satisfies 


^1.0 


<l,n 

^«.o 


1 

+  - -  2w 


A  +  /ii 
1 


^  ^  Mn+l 


/  .  nMn^i/(n  ^  1)^ 

/  ^  ^  *l,n— li  ^  M 


A  -  Mn*  I 


1 


X  +  fii  A  +  ^ 


-  XI  ^kU.k  + 


M. 


Mt 


■^i-l.o,  ‘  =  2,  ■ 


1  A  ^  ^ 


i,n^k 


nM,*«/(»  +  n).  ,  t>,>„/(t-n)^  ,  „  , 

■M',n— 1  '  \  ..  M_i_n)  1  —  “t  ■  ■  ■  I  ^  —  1)  ' 


A  +  Mt+n 


A  T  Mi-«-n 


(5) 


Consider  the  last  expression,  t,,„.  The  first  term  is  the  average  time  that  the  system  spends 
in  state  (»,  n).  The  second  term  is  the  contribution  to  £,  „  due  to  an  arrivad.  The  third  and 
fourth  terms  are  the  contributions  due  to  a  departure  of  a  task  belonging  to  J  and  a  task 
not  belonging  to  J,  respectively. 

We  are  unable  to  obtain  a  closed  form  solution  to  equation  (5).  As  there  are  a  countably 
infinite  number  of  unknown  variables  £,  „,  i  =  1,  •••;’’  =  0<  •*•>  it  is  impossible  to  obtain 
exact  numerical  values  for  these  quantities.  Consequently,  the  remainder  of  this  section  is 
concerned  with  developing  upper  and  lower  bounds  on  the  conditional  expectations  t,  „. 
These  can  be  used  to  obtain  upper  and  lower  bounds  for  E\T\X  =  t],  i  =  1,  •  •  ■.  We  treat 
each  in  turn. 
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3  Su  S 

^  TL  X.  -y 


Figure  3;  The  state  diagram  associated  with  the  lower  bound,  2  tasks  per  job. 


3.2.1  A  Lower  boiind  on  E{T\X  =  ij 

We  study  a  Markov  chain  with  state  that  yields  lower  bounds  on  t,,n, 

I  =:  1,  •  •  • ,  n  =  0,  •  •  •.  This  chain  has  infinitesimal  generator  =  l?|, •*!),({, „»)]  where 


„(«) 


1  =  I  -  1,  n  =  m;  0  <  m  <  B, 
/  =  I,  m  =  n  -  1;  1  <  m  <  B, 
Aom-n,  I  =  /,  0  <  n  <  m  <  B, 

^12T=B-n<^k,  I  =  i,  m  =  B,0  <  n  <  B, 

-(A  +  Mi+n),  «  =  /,  m  =  n,  0  <  m  <  B, 

‘  =  L  m  =  n  =  B, 


otherwise. 


This  Markov  chain  corresponds  to  a  system  in  which  no  more  than  B  tasks  not  belonging 
to  J  are  allowed  in.  Consequently,  this  modified  system  has  fewer  tasks  that  do  not  belong 
to  J  than  the  original  system.  The  response  time  of  J  will  be  less  in  this  system.  Figure  3 
illustrates  the  state  diagram  for  this  Markov  chain  when  each  job  consists  of  exactly  2  tasks. 


The  conditional  expectations,  t)  „  satisfy 


A  +  M1  ■^  +  Ml  t=B+l 

1  n»in-t-i/(n-i-  1)^(»)  ^ 

A  +  Mn+l 

\  ....  « 


A  T  Aln+1 

Vk- 

1 

“t 

A  +  /i,- 

OO  ' 

^  »*4!b  >  n  =  i,---,5, 

R  1  / 


^  ^  \t=i  k=fl+i  y 

-'TMt+n  A  +  ^+n  t=S-n+l 

n/A+n/{‘  +  «)  .(Ji)  .  »M,*n/(»  +  «)  ,(/6)  ,  _  o 

"T+k”  A  +  «.„ 

Last,  £,_„,  n  >  B  is  bounded  from  below  by  i.e., 

*(£M  _  #(**)  ^=1  .••■n=B  +  l  ••• 

Thus  we  have  the  following  lower  bound  on  E\r\X  =  t], 

mx = .•]  <  E  ’^"4!n  +  ^  ^i4!b  .  «■  =  1.  •  •  ■ 


■i,B  I  >  *  ’ 
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Figure  4:  State  diagram  for  upper  bound,  B  =  4. 

3.2.2  An  upper  bound  on  E[T\X  =  *] 

We  study  a  Markov  chain  with  state  that  yields  upper  bounds  on  t,  n, 

i  =  1,  •  •  • ,  n  =  0,  •  •  This  chain  has  infinitesimal  generator  where 


= 


/  =  I  -  1,  n  =  m;  0  <  m  <  S, 
1  =  I,  m  =  n  -  1;  1  <  m  <  S, 
Mt+ni  /  =  »,  m  =  n  -  1,  B  <  m, 

Aam-n,  I  =  1,  0  <  n  <  m, 

-(A  +  I  =  i,  m  =  n,  0  <  m, 

0,  otherwise. 


This  system  behaves  like  the  original  system  except  when  the  number  of  tasks  n  not  be¬ 
longing  to  J  exceeds  B.  In  this  case,  the  system  is  not  allowed  to  serve  J,  but  instead  only 
serves  the  other  tasks.  This  continues  until  the  number  of  additional  tasks  falls  to  B  at 
which  point  the  system  behaves  like  the  original  system.  Figure  4  illustrates  the  behavior 
of  this  system  when  jobs  contain  exactly  two  tasks  and  B  =  4. 

Assume  that  B  >  c.  Consider  the  situation  where  J  c^fitt-ins  «  tasks  and  there  aie  ar. 
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additional  j  <  B  tasks  in  the  system.  Now  assume  that  k  tasks  arrive  and  that  n  —  k  >  B. 
In  this  case,  the  time  during  which  there  are  B  -i- 1  or  more  additional  tasks  in  the  modified 
system  is  equal  to  the  length  of  the  busy  period  associated  with  a  bulk  arrival 
queue  with  rate  nc  that  is  initiated  by  the  arrival  oi  n  -r  k  -  B  tasks.  Consequently,  we 
can  write  the  following  set  of  equations  describing  the  expected  response  time  of  a  job 
conditioned  on  the  number  of  tasks  at  the  time  of  arrival  and  the  number  of  tasks  in  the 
arriving  job,  r 


nMf.-ui/(n+  1)  Jut) 


^  +  Mn+ 1 


^  17^  is  *5.1 


X  +  Hi 


A+S~  A+1.  ^  ^  ^  j  + 

A  +  A  +  M.+„  \  t=B-n+l  y 


+  W+„/(»  +  n)  („i) 


where  6j  is  the  average  length  of  a  busy  period  of  an  Af-^/M/1  queue  with  arrival  rate  A 
and  service  rate  fic  that  is  started  by  the  arrival  of  t  tasks.  The  value  of  6j,  /  =  !,•••  Is 
([GH76!) 

.  _  w. .  ^1  ^ 


=  -(' 
HC  \ 


(mc-A£;(x]), 


(12) 


3.2.3  Approximate  analysis  of  TS*PS 

Let  and  denote  the  r.v.’s  defined  in  the  preceding  sections  that  bound  T  from 
below  and  above.  We  use  the  following  approximation  for  E^\X  =  i], 

=  «■]  =  (£;[rW|A  = »)  +  =  ij)/2.  (h) 

The  accuracy  of  this  approximation  is  high  when  the  system  load  is  low  and/or  when  the 
parameter  B  takes  a  large  value.  We  explore  both  of  these  effects  in  Table  I.  Here  we 
evaluate  the  upper  and  lower  bounds  on  EjT]  for  a  system  of  16  processors  that  process 
fork-join  jobs  containing  exactly  16  tasks.  The  bounds  are  tabulated  for  different  values  of 
the  processor  utilization,  p  =  A/;i  and  for  different  values  of  B.  We  observe  that  sufficient 
accuracy  is  possible  for  processor  utilizations  up  to  .9  provided  B  =  350.  In  this  case,  the 
maximum  error  incurred  by  the  approximation  is  3.6%  at  p  =  .9  and  less  than  .05%  for 
p  <  .8.  We  shall  use  B  =  350  throughout  our  studies. 


p 

B: 

=50 

1  B= 

:100 

B= 

=200 

CD 

II 

CO 

CJ> 

o 

lower 

upper 

lower 

upper  i|  lower 

upper 

lower  I  upper  il 

.1 1 

55.03 

55.03 

55.03 

55.03  1 

55.03 

55.03 

55.03  i  55.03  il 

.2  1 

56.68 

56.41 

56.68 

56.68  1 

56.68 

56.68  II  56.68  |  56.68  |i 

.3  1 

59.21 

59.41 

59.32 

59.32  1 

59.32 

59.32  II  59.32  59.32  i| 

.4  1 

62.95 

63.91 

63.47 

63.47  1 

63.47 

63.47  II  63.47  |  63.47  | 

•5  1 

68.18 

71.78 

70.02 

70.16  1 

70.10 

70.10  1 

70.10  1  70.10  II 

•6  i 

75.17 

87.01 

80.60 

81.70  : 

81.17 

81.17  1 

81.17  1  81.17  li 

.7 

84.14 

121.64 

97.97 

105.37  ! 

101.50 

101.28  1 

101.38  I  101.38  l| 

.8  1 

95.20 

225.95 

126.68 

175.40  1 

142.94 

148.56  II  144.88  |  145.04  1| 

.9  1 

108.14 

792.05 

173.09 

601.33  1 

242.70 

389.11  i 

271.46  1  291.96  (I 

Table  1:  Approximation  Analysis 


4  Comparison  of  Scheduling  Policies 


In  this  section  we  compare  the  performance  of  TS-PS,  JS-PS,  and  FCFS.  Specifically,  we 
compare  the  mean  job  response  time  for  different  processor  utilizations  as  we  vary  the 
number  of  processors  and  the  job  size.  We  also  compare  the  performance  of  TS-PS  and 
FCFS  on  a  system  that  serves  two  classes  of  jobs:  edit  jobs  and  batch  jobs.  Edit  jobs 
are  assumed  to  consist  of  a  single  task  whereeis  batch  jobs  consist  of  many  tasks.  Last, 
we  consider  the  effects  of  partitioning  the  processors  into  two  sets;  one  to  serve  edit  jobs 
exclusively  and  the  other  to  serve  batch  jobs  exclusively.  For  this  last  study,  we  compare 
the  performance  of  the  partitioned  system  under  TS-PS  to  one  where  the  processors  are 
avulable  to  all  jobs  under  TS-PS. 

4.1  Comparison  of  TS-PS,  JS-PS,  and  FCFS 

In  this  section  we  compare  the  TS-PS,  JS-PS,  and  FCFS  policies  as. a  function  of  the 
processor  utilization.  In  Figure  5  we  plot  the  ratio  of  response  times  of  TS-PS  to  JS-PS, 
and  TS-PS  to  FCFS  for  two  workloads  as  a  function  of  the  processor  utilization,  p.  The 
workloads  consist  of  jobs  with  a  constant  number  of  tasks  that  is  equal  to  the  number  of 
processors,  i.e.,  X  =  8,c  =  8  and  =  16,c  =  16.  The  average  task  service  time  is  taken 
to  be  1/c.  From  tins  figure  we  observe  that  FCFS  provides  uniformly  better  response  over 
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the  two  PS  policies  for  ail  processor  utilizations.  Furthermore,  TS-PS  gives  lower  response 
times  than  JS-PS  for  all  processor  utilizations  less  than  0.9.  This  is  due  to  the  fact  that 
TS-PS  takes  advantage  of  the  parallelism  inherent  in  the  fork-join  job.  We  shall  observe, 
however,  TSPS  is  not  always  better  than  JSPS  for  very  high  utilizations  in  Section  4.2. 
The  better  performance  exhibited  by  PCFS  is  due  to  the  fact  that  TS-PS  penalizes  larger 
jobs,  while  no  such  penalty  exists  for  FCFS  (a  more  detailed  discussion  of  this  penalty 
phenomenon  is  given  in  the  next  section). 

We  also  tested  a  workload  consisting  of  two  classes  of  jobs;  edit  jobs  and  batch  jobs.  Edit 
jobs  consist  of  a  single  task  and  batch  jobs  consist  of  16  tasks.  Let  /  denote  the  fraction  of 
jobs  that  are  edit  jobs.  We  considered  three  mixes,  /  =  .5,. 95,. 99  operating  on  a  system 
containing  e  s:  16  processors.  Figure  6  illustrates  ratios  of  the  mean  job  response  time  of 
TS-PS  to  FCFS  as  a  function  of  the  processor  utilization  p.  We  observe  that  the  FCFS 
policy  exhibits  the  best  performance  everywhere  except  when  the  utilization  is  high  and 
the  fraction  of  edit  jobs  is  high  (/  =  .95,  .99).  In  this  region  TS-PS  provides  slightly  lower 
response  times. 

This  workload,  (/  =  .95,  .99),  was  chosen  so  as  to  increase  the  variability  in  the  service  job 
service  times  in  an  attempt  to  illustrate  a  setting  in  which  TS-PS  outperforms  FCFS.  It  is 
surprising  that  the  difference  is  so  small.  This  is  an  indication  that  FCFS  is  a  more  robust 
policy  on  multiprocessors  that  execute  parallel  programs  than  it  is  in  a  system  where  jobs 
are  executed  serially. 

From  this  figure  we  can  also  observe  that  TS-PS  provides  only  slightly  better  service  to  edit 
jobs  than  FCFS,  but  significantly  worse  service  to  batch  jobs. 

4.2  Dependence  on  Number  of  Servers 

In  the  last  section  we  observed  that  TS-PS  provides  better  performance  than  JS-PS  for  all 
of  the  examples.  This  appears  to  be  at  odds  with  observations  that  we  noted  in  an  earlier 
study  (RTS87]  on  the  performance  of  TS-PS  and  JS-PS  in  a  single  processor  system.  In  a 
single  processor  system,  JS-PS  was  shown  to  be  uniformly  better  than  TS-PS.  This  is  due 
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to  the  fact  that  in  such  a  system  there  is  no  possibility  for  parallelism  and  the  following 
occurs.  Assume  that  there  are  2  jobs,  one  with  1  task  and  one  with  9  tasks.  Then  TS-PS 
gives  the  job  with  9  tasks,  9/10  of  the  processor,  and  the  job  with  a  single  task  only  1/10 
of  the  processor.  However,  JS-PS  would  give  each  job  1/2  of  the  processor.  So  on  a  single 
processor,  TS-PS  penalizes  jobs  with  a  small  number  of  tasks.  In  a  multiprocessor,  there 
exists  sufficient  possibilities  for  parallelism  so  that  this  anomaly  found  in  a  single  processor 
for  TS-PS  does  not  exist. 

To  study  the  effect  of  parallelism,  we  consider  a  workload  of  jobs  consisting  of  16  tasks  and 
study  the  performance  of  TS-PS  and  JS-PS  on  systems  with  c  =  1, 2, 4, 8. 16, 32  processors 
as  a  function  of  processor  utilization.  Figure  7  illustrates  the  results  of  this  study  plotting 
the  response  time  ratios  of  TS-PS  to  JS-P*”.  We  observe  that  TS-PS  is  always  better  than 
JS-PS  in  multiple  processor  systems  when  processor  utilization  is  low.  However,  when  the 
number  of  processors  is  small  (<  8),  there  exists  a  utilization  value,  say  po  such  that  system 
performance  is  better  under  JS-PS  when  (t  >  po.  This  threshold  is  an  increasing  function 
of  c  the  number  of  processors.  This  results  because  as  the  number  of  processors  increases, 
the  capability  of  sustaining  parallel  program  execution  under  TS-PS  increases. 

4.3  Processor  Partitioning 

V/e  now  study  the  effect  of  dedicating  a  potion  of  the  multiprocessor  to  each  of  the  batch 
and  edit  classes.  For  edit  jobs  we  assume  that  the  computation  time  is  small  and  equivalent 
to  one  task  unit.  Batch  Jobs  are  assumed  to  be  large,  consisting  of  fork-join  tasks.  The 
individual  tasks  from  either  class  are  assumed  to  have  the  same  service  requirements. 

In  order  to  examine  the  effect  of  statically  dedicating  a  portion  of  the  multiprocessor  to  each 
class,  we  compare  the  performance  of  a  system  composed  of  16  servers  .where  each  server 
can  run  either  class  of  job,  to  a  partitioned  system  where  some  fraction  of  the  processors 
are  dedicated  to  each  class.  The  combined  system  is  composed  of  c  =  16  servers.  The 
partitioned  system  is  composed  of  c  =  16  servers  such  that  K  servers  are  dedicated  to  edit 
jobs  and  e  —  K  servers  are  dedicated  to  batch  jobs.  Our  performance  metric  is  the  ratio  of 
the  response  time  of  the  partitioned  system  to  that  of  the  combined  system. 
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B=350  FIGURE 


In  this  experiment  the  independent  parameter  is  the  combined  system  utilization.  Our  first 
experiment  consists  of  an  arrival  of  50  percent  edit  jobs  and  50  percent  batch  jobs.  Note 
that  this  arrival  pattern  results  in  the  total  computation  time  of  edit  jobs  to  be  1/16  of 
batch  jobs.  The  partitioned  system  is  defined  by  K  and  the  equivalent  flow  of  jobs.  We 
plot  our  results  in  (Figure  8). 

We  can  observe  several  interesting  phenomena  from  Figure  8.  First,  by  dedicating  only  one 
server  to  the  edit  jobs,  K  =  I,  both  edit  and  batch  jobs  degrade.  Thus,  a  poor  partitioning 
choice  negatively  effects  both  classes  of  jobs.  Second,  improvements  can  be  made  in  the  edit 
jobs  by  allocating  enough  additional  servers,  K  =  2,2,  to  handle  the  computational  load  of 
edit  jobs,  but  this  is  done  at  the  expense  of  the  batch  jobs.  This  phenomena  is  especially 
striking  at  high  utilizations. 

As  the  relative  arrival  rate  between  edit  and  batch  jobs  increases,  as  show  in  (Figure  9)  where 
the  proportion  of  edit  jobs  is  95  percent,  we  see  that  more  servers  must  be  dedicated  to  edit 
jobs  before  the  mean  response  time  is  decreased.  Note  that  in  this  case  the  total  computation 
time  required  by  edit  jobs  is  greater  than  needed  by  batch  jobs.  The  result  is  that  9  of  the 
16  processors  are  required  to  reduce  the  edit  job  response  times  (see  (Figure  9)).  There  are 
regions  in  the  figure  in  which  the  performance  of  both  jobs  classes  decrease,  however,  we 
observe  no  region  in  which  both  classes  improve  performance.  This  phenomena  has  also 
been  reported  in  (NTT87]  for  FCFS  scheduling. 

Figure  10  reports  the  results  when  batch  jobs  are  composed  of  4  tasks  and  the  workload 
contains  50%  batch  and  50%  edit  jobs.  The  results  in  this  figure  are  similar  to  the  95% 
edit  jobs  and  5%  batch  job  tests  shown  in  the  previous  figure.  The  reason  for  this  is  that 
when  batch  jobs  are  fairly  small,  z  =  4,  and  there  are  50%  edit  jobs  and  50%  batch  jobs  in 
the  workload,  then  the  total  computational  requirements  of  edit  jobs  is  high  (as  in  the  95% 
test)  for  a  given  utilization.  Therefore,  edit  jobs  will  saturate  a  small  number  of  processors. 
Notice  that  only  when  the  number  of  processors  dedicated  to  editing  reaches  4,  does  editing 
perform  well. 
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5  Summary 


We  have  analyzed  fork-join  programs  as  a  jMjc  queueing  system.  We  have  obtained 
am  expression  for  the  mean  response  time  of  a  fork-join  task  under  processor-sharing.  Since 
our  expression  is  not  in  closed  form,  but  given  as  a  set  of  recurrent  equations,  we  have 
obtained  expressions  for  both  lower  and  upper  bounds.  Our  bounds  become  tight  as  the 
number  of  states  increase. 

We  have  compared  three  scheduling  approaches:  TS-PS,  JS-PS  and  FCFS.  We  have  ob¬ 
served  that  in  general  FCFS  out  performs  both  TS-PS  and  JS-PS.  Likewise,  we  have  ob¬ 
served  that  TS-PS  performs  better  than  JS-PS  unless  that  number  of  servers  is  small  com¬ 
pared  to  the  number  of  tasks. 

We  have  considered  the  interesting  problem  of  partitioning  the  system  into  two  subsystems. 
Each  subsystem  is  dedicated  to  one  of  two  job  classes;  edit  jobs  and  batch  jobs.  We 
determined  several  interesting  results.  When  half  the  jobs  are  edit  jobs  and  one  server  is 
dedicated  for  edit  jobs  exclusively,  both  classes  experience  an  increase  in  response  time. 
Improvements  in  edit  jobs  always  cause  a  reduction  in  the  performance  of  batch  jobs  in 
the  partitioned  system.  This  suggests  that  a  parallel  system  should  have  a  controllable 
boundary  for  processor  partitioning. 
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In  this  paper  we  study  the  class  of  acyclic  fork-join  queueing  networks,  in  short  ’’AFJQN’s",  that 
arise  in  the  performance  analysis  of  parallel  processing  applications  and  flexible  manufacturing 
systems.  We  obtain  the  stability  conditions  and  develop  upper  and  lower  bounds  on  the  performance 
of  this  class  of  networks  under  very  general  workload  assumptions. 

AFJQN’s  arise  very  naturally  in  parallel  processing  applications.  Many  parallel  programs  zu’e 
decomposed  into  tasks,  each  of  which  can  execute  on  a  separate  processor.  The  division  of  the 
parallel  program  into  tasks  can  be  described  by  a  directed  graph  where  the  nodes  correspond  to 
tasks  and  the  directed  edges  represent  the  precedence  relations  between  the  tasks.  In  many  cases, 
the  underlying  graph  is  acyclic  and  the  program  is  implemented  with  the  use  of  fork  and  join 
constructs.  Briefly,  a  fork  exists  at  each  point  in  a  parallel  program  that  one  or  more  tasks  can 
be  initiated  simultaneously.  A  join  occurs  whenever  a  task  is  allowed  to  begin  execution  following 
the  completion  of  one  or  more  other  tasks.  Forks  and  joins  reflect  themselves  in  the  underlying 
computation  graph  in  the  following  manner.  A  task  that  has  one  or  more  outgoing  edges  corresponds 
to  a  fork.  A  task  with  one  or  more  incoming  edges  corresponds  to  a  join.  These  are  exemplified  by 
the  parbegin  and  parend  constructs  that  are  available  in  parallel  programming  languages  such 
as  Concurrent  Pascal  (Br  75j,  Concurrent  Sequential  Processes  (CSP)[Ho  78),  and  Ada  [Py  81). 

Consider  a  multiple  processor  system  where  each  task  in  a  specific  program  is  mapped  onto 
a  separate  processor.  The  execution  of  a  single  program  request  can  be  described  as  follows;  (i) 
Upon  completion  of  a  marked  task,  tokens  associated  with  the  program  are  routed  to  each  processor 
handling  the  tasks  that  follow  the  marked  task  in  the  underlying  computation  graph;  (ii)  Once  a 
processor  has  received  tokens  from  all  tasks  that  precede  a  marked  task  in  the  computation  graph, 
this  processor  is  allowed  to  execute  it.  Let  this  system  be  required  to  service  a  stream  of  requests 
corresponding  to  different  instances  of  that  program  and  assume  each  processor  executes  its  tasks 
in  the  order  defined  by  the  program  arrival  dates.  We  have  described,  in  brief,  an  AFJQN.  Figure 
la  illustrates  a  hypothetical  parallel  program  using  forks  and  joins  and  Figure  lb  illustrates  the 
associated  fork«join  queueing  network. 

AFJQN’s  also  arise  naturally  in  the  context  of  flexible  manufacturing  systems.  In  production 
lines,  objects  are  built  by  assembling  multiple  parts  together.  The  successive  assembly  steps  are 
described  by  an  acyclic  graph  where  the  nodes  correspond  to  assembly  operations  and  the  edges  to 
precedence  constraints  between  these  operations.  Here,  a  join  occurs  whenever  all  the  parts  to  be 
produced  by  the  operations  that  precede  a  marked  operation  have  to  be  available  in  order  to  begin 
assembling.  A  fork  occurs  at  points  where  several  assembly  operations  are  initiated  simultaneously 
(  for  instance  at  points  where  the  production  of  some  part  is  foDowed  in  the  underlying  graph  by 
several  assembly  operations  to  be  done  on  this  same  part  ).  Assume  each  assembly  operation  is 
allocated  to  a  specific  machine.  We  have  another  instance  of  AFJQN  when  identifying  assembly 
machines  with  the  servers  of  the  queueing  network  and  the  parts  with  its  customers. 

Apart  from  the  subclass  of  Jackson  series  networks,  the  type  of  queueing  networks  we  consider 
here  remain  basically  unsolved.  It  can  be  ahown  that  the  "synchronisations*  induced  by  the  forks 
and  the  joins  destroy  all  nice  properties  like  insensitivity  or  product  form,  so  that  every  problem 
becomes  computationally  hard.  Initially,  most  attention  focussed  on  fork-join  networks  consisting 
of  B  queues  in  parallel.  In  this  case,  exact  solutions  have  been  provided  lot  B  =  2  in  [FH  84]  and 
[Ba  85).  Approximate  solutions  and  bounds  have  been  provided  for  arbitrary  values  of  B  in  |BM 
85],  [NT  85],  [TY  86]  and  (BMS  87].  Conditions  for  stability  have  been  presented  for  arbitrary 
values  of  B  in  [BM  85]  and  [Si  87].  Last,  modeb  have  been  developed  for  programs  exhibiting 
parallel  fork-join  structures  that  are  executed  on  multiple  processors  serving  a  single  queue  m 
[KW  85]  and  [NTT  87].  Series-parallel  Fork  Join  queueing  networks  have  been  introduced  in  [BM 
85],  where  stability  condition  and  bounds  were  derived. 

Several  classes  of  stochastic  ordering  principles  have  been  considered  in  the  queueing  literature 
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(  see  {St  84]  for  a  comprehensive  treatment  of  the  issue  ).  It  was  shown  for  instance,  that  an 
increased  input  (  reap,  decreased  output  )  intensity  leads  to  higher  {  resp.  reduced  )  moments 
of  the  waiting  or  response  times  for  wide  classes  of  queueing  systems  (  see  [Wh  81j  ).  Another 
type  of  ordering  comes  from  the  idea  that  an  increased  variability  of  either  the  input  or  the  service 
statistics  should  also  lead  to  higher  waiting  or  response  times.  This  has  been  discussed  by  several 
authors  in  the  context  of  isolated  queues  (  see  (St  84j,  (Ha  84],  [Wh  84], [BM  85bl  ).  The  latter 
ordering  principle  was  used  in  [BM  85]  (  reap.  [BM  85b]  )  to  compare  the  moments  of  the  delays 
experienced  by  customers  traversing  parallel  (  resp.  series-parallel  )  fork-join  queueing  networks  to 
the  related  moments  of  product  form  networks.  Both  upper  and  lower  bounds  were  derived  using 
this  principle. 

A  third  type  of  ordering  arises  when  a  set  of  random  variables  (RV’s)  are  associated.  In  this 
case  the  statistics  of  the  maximum  over  these  RV’s  are  bounded  by  the  maximum  of  the  marginals 
of  these  RV’s  .  This  approach  was  used  in  [NT  85]  and  [BMS  87]  to  develop  upper  bounds  on  the 
moments  of  the  delays  experienced  by  customers  traversing  a  parallel  fork-join  network. 

The  aim  of  this  paper  is  to  extend  the  scope  of  these  ordering  and  bounding  techniques  to 
the  class  of  arbitrary  AFJQN’s  which  are  rigorously  defined  in  Section  2.  The  equations  governing 
the  behavior  of  these  networks  are  provided  in  Section  3.  This  section  also  contains  necessary  and 
sufficient  conditions  for  the  stability  of  these  networks  under  fairly  general  statistical  assumptions. 
This  stability  result  is  baaed  on  an  extension  of  Loynes’  method  [Lo  62]  to  this  class  of  queueing 
networks.  Bounds  based  on  convex  ordering  are  described  in  Section  4.  Although  these  arguments 
yield  upper  and  lower  bounds  on  the  moments  of  customer  delays,  tighter  upper  bounds  are  obtained 
in  Section  5  using  stochastic  ordering  properties  of  associated  RV’s.  Sections  6  and  7  are  devoted 
to  the  derivation  of  bounds  of  practical  interest  based  on  convex  ordering  and  associated  RV’s 
respectively.  All  these  bounds  exhibit  the  same  stability  condition  as  the  initial  queueing  system. 

2  Notation  and  definitions 

We  are  concerned  with  the  delays  that  customers  experience  when  they  traverse  an  Acyclic 
Fork-Join  Queueing  Network  Here  fi  is  represented  by  an  acyclic  graph  G  =  (V,  E)  where  V  is 
a  set  of  B  FIFO  queues  labeled  i  =  I,..., B  and  B  is  a  set  of  links  such  that  c  E  implies  j  >  i 
(such  an  ordering  is  always  possible  in  an  acyclic  graph). 

Define  the  set  of  immediate  predecessors  of  queue  t,  p(t),  to  be  the  set  of  queues  that  have  a 
direct  link  to  queue  t 

P(‘)  =  {  J  <  (1.  I  (j,i)  <  E  }  (2.1) 

and  the  set  of  immediate  successors  of  queue  i,  ${i),  to  be  the  set  of  queues  to  which  i  has  a  direct 
link 

»(•)  =  {  J  <  (1.5)  i  ('.j)  <  5  }.  (2.2) 

Define  the  set  of  predecessors  of  queue  i,  x(t),  to  be  the  set  of  queues  that  have  a  (possibly) 
indirect  link  to  queue  t  : 

'(0  =  (2  3) 

where  p(X)  denotes  the  set  of  immediate  predecessors  of  the  queue  of  X,  a  subset  of  (I, ...,  B)  and 
P"(^)  denotes  p(p(..p(X))..)). 
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We  also  denote  as  a(0)  the  set  of  queues  with  no  incoming  links  and  as  p(B  +  1)  the  set  of 
queues  with  no  outgoing  link.  It  will  be  assumed  that  the  numbering  of  queues  is  such  that 


s(0)  =  (l,....So),  Bo<B 


(2.4) 


and 


p(B+l)  =  (Bt . B),  Bi<B.  (2.5) 

Observe  that  p(t)  =  0  if  i  e  s(0)  and  s(i)  =  0  if  t  c  p(B  +1). 

We  associate  with  queue  j,  I  <  j  <  B,  &  sequence  ,  where  e  J?'*’  represents  the 

service  requirement  of  the  n-th  customer  to  enter  this  queue.  Queue  j  behaves  as  a  single  server 
FIFO  queue  so  that  an  arrival  pattern  to  this  queue  together  with  the  sequence  fully 

determine  the  sequence  of  service  completion  dates  (  using  the  Lindley-Loynes  equations  ). 
Definition  0 

An  acyclic  queuing  network,  as  defined  above,  is  an  it  Acyclic  Fork-Join  Queueing  Network  if  it 
obeys  the  following  rules: 

fi)  There  is  a  single  ezogeneous  arrival  stream  with  pattern  Oq  =  0  <  aj  <  ..  <  < 

..eR'*’.  The  n-th  customer  arrival  to  queue  i,  1  <  i  <  Bo,  coincides  with  the  n-th 
date  of  this  ezogeneous  stream.  A  stated  above,  this  fully  determines  the  sequence  of 
service  completion  in  the  queues  I  <  j  <  Bo. 

(ii)  A  service  completion  in  queue  i  does  not  systematically  trigger  an  arrival  to  a  queue  of 
s(i).  The  arrivals  to  queue  j,  j  >  Bq,  are  precisely  generated  as  follows:  assume  the 
sequence  of  service  completions  is  known  for  all  queues  I  <  »  <  j,  where  Bo  <  j  <  B. 
The  n-th  customer  arrival  to  queue  j,  o^,  coincides  with  the  latest  of  the  n-th  service 
completions  in  the  queues  of  p{j).  Due  to  the  acyclic  structure  ofV ,  this  successively 
defines  the  arrival  patterns  in  queue  Bq  +  1,  Bq  +  2, B. 

(Hi)  There  is  a  single  output  stream  out  of  this  network.  Its  n-th  event  coincides  with  the 
latest  of  the  n-th  service  completions  in  the  queues  B\,B\  +  1,...,B. 

As  it  will  be  seen  in  the  next  section,  these  three  rules  fully  determine  the  evolution  of  the  queueing 
network. 

Some  of  the  bounds  dicussed  in  the  application  sections  6  and  7  will  only  apply  to  certain 
subclasses  of  AFJQN's,  namely  parallel  and  series  networks.  An  AFJQN  0  is  said  to  be  a  parallel 
one  with  K  >2  subnetworks  with  respective  underlying  graphs  G*  =  (Vjt.Bfc),  1  <  fc  <  /f ,  if  its 
graph  G  is  decomposable  into  the  K  disconnected  subgraphs  Gi,...,G k-  An  AFJQN  0  is  said  to 
be  a  series  one  with  K  >  2  subnetworks  with  respective  underlying  graphs  Gk  =  {Vk,Ek),  1  < 
k  <  K,  if  its  graph  G  is  connected  and  exhibits  the  following  property;  There  are  K  —  I  vertices 
1  <  I'l  <  ii...  <  i'k’-i  <  B  such  that  there  are  no  dirbct  links  between  the  vertices  of  (l,..,ik  -  1) 
and  those  of  (t'k  +  1,  ..B)  for  all  1  <  k  <  A*  —  1.  The  graph  Gk  is  the  defined  as  the  restriction  of 
G  to  the  vertices  (ifc_i  +  l,..,ik),  where  I'o  =0  and  =  B.  Figure  2  illustrates  a  parallel  AFJQN 
and  a  series  AFJQN. 

3  Evolution  equations  and  steady  state 

For  n  >  0  and  1  <  i  <  B,  let  tr*„  e  R'*'  be  the  service  requirement  of  the  n-th  customer  to 
be  served  in  queue  i  (  there  is  hence  a  zero-th  customer  !)  and  r„  be  the  n-th  interarrival  of  the 
exogeneous  stream  :  r„  =  a„+i  -  a„,  n  >0.  Similarly,  let  dj,  e  R"^  be  the  delay  between  the 
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n-th  exogeneous  arrival  date  and  the  beginning  of  the  n-th  service  in  queue  i  and  be  the  n-ih 
network  response  time  defined  as  the  delay  between  the  n-th  exogeneous  arrival  and  the  n-th  dat< 
of  the  global  departure  process. 

Lemma  1 

Assume  the  network  ts  empty  at  time  0.  Then,  for  n  >  0, 

(3.1) 

•«p(j) 

where  the  maximum  over  an  empty  set  is  zero  by  convention  and 

di  =  maxidi  +  ai).  (3.:) 

The  n-th  network  response  time,  R^,,  is  given  by 

+  <)•  (3  3) 

Proof 

The  boundary  condition  (3.2)  follows  from  the  assumption  on  the  initial  condition  and  from  rules 
(i)  and  (I'l)  that  define  AFJQN’s.  For  j  such  that  1  <  j  <  Bo,  the  inputs  in  queue  j  coincide  with 
the  exogeneous  arrivals  and  d^  is  thus  the  n-th  waiting  time  in  a  FIFO  queues  with  interarrival 
sequence  {fn}*  and  service  requirements  We  have  hence  the  classical  Lindley-Loyncs 

equations 

<+i  =moi(0,<i;-l-<ri-r„),  n  >  0,  l<i<Bo,  (3.4) 

which  is  exactly  equation  (3.1)  since  p(j)  =  0. 

Let  j  be  such  that  p{j)  -f.  0,  and  assume  that  is  known  for  all  t  (  p{j)  so  that  the  n-th 

service  completion  in  queue  i  <  p{j)  takes  place  at  dj,  -f-  o\.  According  to  rule  (u),  the  n  -f  1-st 
arrival  to  queue  j  takes  place  at 


a„+i  -f  m«(dj,+  j  {3-5) 

«p(j) 

Since  the  server  of  queue  j  becomes  available  for  serving  the  n  -f  1-st  customer  at  time 

fln+d'-f-oi,  (3.6) 

it  follows  that  d^^j  is  equal  to  the  expression  in  the  r.b.s  of  equation  (3.1).  Elquations  (3.1)  and 
(3.2)  are  the  basic  evolution  equations  of  the  network,  from  which  the  transient  bounds  of  section 
4  and  5  will  be  derived. 

The  remainder  of  this  section  is  devoted  to  the  construction  of  the  stationary  regime  of  such 
networks.  This  construction  will  be  essential  in  the  continuation  of  the  transient  bounds  to  steady 
state  bounds.  Consider  the  following  set  of  assumptions. 

Ho  The  sequence  {r„,<rj^,l  <  j  <  Bo}”-  (B'*’)®'*’*  forms  a  stationary  and  ergodic 

sequence  of  integrable  RV's  on  the  probability  space  (Q,F,P). 
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Theorem  2 

Let  j  be  fixed  1  b  J  <  B.  Assume  Hq  holds  and  that  for  all  i  £  p(j),  converges  weakly  to  a  fmxte 
and  integrable  RV  d^^  when  n  goes  to  oo.  Assume  in  addition  that 

<  ■^Kl  Vi  £  x{j).  (3.7) 

Then  the  distribution  functions  of  the  RV‘s  dl^  converge  weakly  to  a  finite  RV  when  n  goes  to 
oo.  More  precisely,  under  these  conditions,  there  exists  a  sequence  of  RV's  6^,  n  >  0  on  (Cl,  F,P] 
such  that  6^  and  dj,  are  equivalent  in  law  for  all  n  >  0  {  d^^  =  ,t  )  and  8^  increases  pathwise  to 
a  finite  limit  when  n  goes  to  oo. 

The  proof  is  presented  in  Appendix  1. 

4  Bounds  based  on  convex  ordering 

We  are  now  in  position  to  prove  the  stochastic  ordering  result.  Consider  a  network  0  in  C 
and  assume  that  all  the  RV’s  {un}”  and  {On)T >  ^  j  'S:  B  are  defined  on  the  probability  space 
{Cl,F,P)  and  are  all  integrable. 

Let  now  {o„)o“  and  <  i  5  B,  be  a  set  of  "smoother”  jurival  and  service  processes 

on  (n,  F,  P)  in  the  sense  that  there  exists  a  sub  <r-algebra  say  G  of  F  such  that  for  all  n  >  0, 

r„  =  a«+,  -  a„  =  B[r„|Cj  a  s.  (4.1) 


and  for  all  j  in  B, 


ai  =  E{ai\G]  as.  (4.2) 

These  new  variables  are  smoother  than  the  original  ones  in  the  following  sense  ;  let  6  and  6  be 
two  non-negative  and  integrable  RV's  on  (fl,  F,  P)  such  that 

6  =  B16|G]  a.s.  (4.3) 

Owing  to  Jensen’s  theorem  for  conditional  expectations,  (4.3)  entails 

/(6)  =  /(£:(6|G1)  <  B(/(6|C)].  a.s.  (4.4) 

for  all  convex  nondecreasing  function  /;  R'*'  — »  B"*"  such  that  the  expectations  exist.  This  in  turn 
entails  that  for  all  such  / 


£:(/(6)l  <  E{f{b)]  (4.5) 

which  can  be  rephrased  in  terms  of  the  convex  increasing  stochastic  ordering  of  Stoyan  [St  84{  as 
follows  ; 


6  <„  6-  (4-6) 

Observe  that  b  and  6  have  hence  the  same  first  moment  and  higher  moments  are  always  larger  for 
b  than  for  b. 
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Let  be  the  delay  variable  obtained  with  the  new  arrival  and  service  pattern 
j  ~  l,B.  The  main  result  of  this  section  is  the  following  theorem  : 

Theorem  3 

For  ali  n  >  0  and  1  <  j  £ 


is  integrable  and  <  E[dl^\G]  a  s. 


•1  7: 


Proof 
Basis  step 

Consider  the  case  n  =  0  we  shall  show  that  (4.7)  holds  for  all  j  =  1,  S  by  induction  on  j 
Basis  step 

Consider  all  j  such  that  p{j)  =  0,  equation  (3.2)  shows  that 

dj=d^  =  0,  (4S) 

so  that  (4.7)  holds. 

Inductive  step 

Assume  that  the  hypothesis  is  true  for  all  i,l  <  i  <  j  where  Ba  <  j  <  B  U 
is  plain  from  (3.2)  that  d^  is  then  integrable.  Applying  Jensen’s  inequality  for 
conditional  expectations  to  (3.2)  yields 

E[dJlG]  >  max(£|d’o!Gl  +  5^),  (4  9) 

so  that  if  the  predecessors  of  j  satisfy  property  (4.7),  so  does  queue  j  since  (4  9) 
implies  then  ; 


£:[dJ|C]  >  max((i’o  +  a^).  (4  10) 

This  completes  the  proof  of  the  basis  step. 

Inductive  step  Assume  now  that  the  property  (4.7)  was  established  for  all  queues  up  to  rank  n 
We  now  show  that  the  property  holds  also  for  n  +  1.  This  is  done  by  induction  on  1  <  j  <  B 

Basis  step 

Consider  all  j  such  that  p(j)  =  9.  (3.1) 

=  mai(di +<ri  -  r„,0),  (4  1 !  i 

so  that  d^^,  is  also  integrable.  Jensen’s  inequality  together  with  (4.1)  and  (4  21 
imply  that 


>  maziEldi,lGl  +  a’„  -  r„.0)  (4  12) 

Hence,  since  (4.7)  is  satisfied  for  rank  n,  we  get  from  (4.12)  that 

^  rnaz(d’„  +  -  r„,0)  =  dj(^.,  a  *-.  I  <  J  <  ^o.  (4  17 
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80  that  the  property  is  also  true  for  rank  n  +  1. 

Inductive  step 

Assume  (4.7)  holds  for  all  i,  1  <  i  <  j,  where  So  <  7  <  S  we  now  show  that  the 
property  holds  for  7.  It  follows  from  (3.1)  thit  d^^jis  also  mteprable.  Ap|)lyiiig 
Jensen’s  inequality  to  (3.1)  and  using  (4.1)  and  (4.2),  we  get 

>  mai(max(Sld’„^jCj  +  a;+,),SldilG)  +  ai  -  f„).  (4.14) 

»‘p(j) 

Using  now  the  ordering  property  for  rank  'n,  we  get 

>  fnai(mM(£;[d‘„+,|Gj  +  +  -  r„)  a.s.  (4.15) 

•«p(j) 

Since  the  property  is  satisfied  for  the  predecessors  of  7,  we  get  that  it  is  then 
satisfied  by  queue  7  too  since  (4.15)  entails  that 


E[dl^^i\G\  >  mai(m«(J‘„^.,  +ff;^.,),di  +  ai  -  f„)  =  (4.16) 

“p(j; 

This  complete  the  induction  step  on  7. 

This  completes  the  induction  step  on  n  sind  proves  the  lemma. 

Remark 

Observe  that  theorem  3  also  holds  under  the  weaker  assumptions. 

>  £^[r„|<7),  rr  >  0  (4.17) 

and 


ai  <  E[al\G\,  n  >  0,  7  =  l.fl.  (4.I8) 

Corollary  4 

For  all  n  >  0  and  i  =  I,  B. 

di,  >c.  d^.  (4.19) 

Proof 

Due  to  Jensen’s  inequality 


E\im\0\  >  HEld’JG]),  (4.20) 

SO  that  using  equation  (4.7)  and  the  increasingness  of  /, 

mdi)\G\  >  fm.  (4.21) 
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Equation  (4.19)  follows  now  directely  from  (4.21). 

The  next  corollary  shows  that  if  the  network  achieves  steady  state  in  the  sense  of  Theorem 
the  transient  bounds  of  corollary  4  extend  to  steady  state. 

Corollary  5 

Ataume  that  both  =  1,  B}~  satisfy  the  condition  Ho  and  that  d^ 

and  d\  converge  weakly  to  finite  RV'a  and  respectively.  Then 


<c.  d^ 

'^OO  — C*  **oo 


(4.22) 


Proof 

Assume  /(S^,)  is  integrable.  Since  <5^  <  5^,,  and  d^,  =„  5^,  d^  =„  it  is  then  easy  to  prove 
that  /(dj^)  and  /(djj)  are  both  integrable  for  all  n  >  0  so  that  corollary  4  entails 

£^(/(«i)l  =  E\f{di)\  <  E\f{di)\  =  E[f{6i)\.  (4.23) 

Letting  n  go  to  infinity  in  the  inequality 

i^l/(^i)l  <  ^1/(5^)]  (4-24) 

yields  the  desired  result  using  the  bounded  convergence  theorem. 

Remark  1 

Consider  a  two  queue  series  network  and  denote  as  n  >  0,  j  =  1,2  the  waiting  time  of  the 
n-th  customer  to  enter  queue  j.  We  have  the  following  inductions  for  the  RV’s  W*  initialized  by 
the  condition  =  0  : 


^^n+i  =  max(lV,J  +  <r‘  +  a,»  -  On+i,0), 

n  >  0 

(4. 

25) 

and 

=  mai(W’  +d„  -  d„+i.O), 

n  >  0, 

(4 

.26) 

where  the  RV’s  (dn}o°  are  the  departure  epochs  from  queue  1  : 

dn+i  -  d„  =  +  mai(a„+i  -  o„  - 

-W'^0). 

(4. 

.27) 

Observe  that  due  to  the  decreasingness  of  the  r.h.8  of  (4.27),  considered  as  a  function  of  VV^, 
we  cannot  derive  from  this  any  simple  comparison  result  between  {dn+i  -  d„)  and  (d„+i  -  d„) 
when  using  Jensen’s  inequality  as  before. 

We  prove  in  Appendix  2  that  there  is  actually  no  such  general  ordering  result  by  considering 
two  simple  stationary  queueing  systems  where  an  increased  variability  of  the  sequence  (rn,<r^)  has 
the  following  respective  effects  : 

•It  increases  the  variability  of  the  interdeparture  distribution  for  the  first  one, 

-It  decreases  it  for  the  second  one. 

This  strongly  suggests  that  the  stochastic  ordering  result  of  this  section,  which  apply  to  the 
total  delays  does  not  extend  to  the  individual  waiting  times  . 

5  Bounds  based  on  assor'ation 
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5.1  Association  of  the  delays 


Before  entering  the  core  of  this  section,  we  introduce  some  terminology  that  will  be  useful  in 
the  forthcoming  analysis  and  review  the  properties  of  stochastic  ordering  and  associated  RV’s  that 
will  be  useful  to  us. 

Definition  6  ([BP  75|] 

Real  valued  RV’a  ai,  are  aaid  to  be  aaaociated  if 

cot/[h(ai,...,a„)  ,  ^(ai,  ...,a„)l  >  0  (511) 

for  all  paira  of  increaaing  functiona  h,  g  :  R’^  — •  R.  Asaociation  of  RV’a  entaila  the  folloming 
properties  : 

1.  Any  subset  of  aaaociated  RV’a  are  associated, 

2.  Increasing  functiona  of  aaaociated  RV’a  are  aaaociated, 

S.  Independent  RV’a  are  aaaociated, 

4-  If  two  acts  of  associated  RV’a  are  independent  of  one  another,  then  their  union  forma  a  set  of 
associated  RV’a  , 

5.  If  ai, On  are  associated  RV’a,  then 

ft 

P\  max  a,  <  t)  >  P[a«  <  t]-  (5-1-2) 

.=i 

We  are  now  in  position  to  derive  the  main  results.  Network  0,  {rn}*  and  >  3  —  IfP 

are  defined  as  in  section  2.  The  following  assumptions  will  be  made  throughout  the  section 

P'1  {>'n}o’  «  independent  o/{{<r^}S“},  1  <  J  <  B, 

IS  a  set  of  independent  RV’a  and 
1  <  J  <  **  *  of  independent  RV’a. 

Lenruna  7 

Assume  Hi  holds.  For  all  m  >  0,  (dj^,  1  <  j  <  B,  0  <  n  <  m}  is  a  set  o/  associated  RV’a  . 

Proof 

We  shall  actually  prove  the  more  general  result  that  (d^,  1<J<B,  0<n<m  —  1}U{“’’»»>*'  ^ 
0}  ^  0i  1  <  i  <  B}  is  a  set  of  associated  RV’s  for  1  <  Jk  <  B,  m  >  0.  This  is  done  by 

induction  on  m. 

Basis  step 

Consider  the  case  m  =  1.  We  shall  show  that  {dj,!  <  j  <  ^  ^  0>  1  ^ 

j  <  B)  is  a  set  of  associated  RV'a  for  all  1  <  A:  <  B  by  induction  on  k. 

Basis  step 

Consider  all  j  such  that  p{j)  =  0.  d^  can  be  expressed  as 

dly  =  0.  (5.1.3) 

Consequently,  {dj},  1  <  y  <  Bo  is  a  set  of  independent  RV’a  which  along  with 
{-»■«,«  >  0}  ri  >  0, 1  <  J  <  B}  form  a  set  of  associated  RV’s  according  to 

property  4. 

Inductive  step 


Assume  that  the  hypothesis  is  true  for  all  f,  1  <  i  <  i  where  Bq  <  k  <  D  We  now 
show  that  it  is  also  true  for  k.  Note  that  p[k)  ^  0.  Dy  definition, 

do  =  max  (dj,  +  crj)  (511) 

which  is  an  increasing  function  of  associated  RV’s  (note  that  i  <  if  t  «  p{k))- 
Therefore  it  follows  that  {dg,  1  <  j  ^  n  >  0,  I  <  j  <  B} 

is  a  set  of  associated  RV’s  . 

This  completes  the  proof  of  the  basis  step 
Inductive  step 

Assume  that  the  hypothesis  is  true  up  to  m.  We  now  show  that  the  hypx^thesis  holds  also  for  1. 

This  is  done  by  showing  that  the  RV'a  (d^,  l<j<k,  0<n<  rn}  U{-r„,  h  >  0}U{<’'i>  ^ 
0,  I  <  j  <  B}  are  associated  for  all  1  <  Jfc  <  5  by  induction  on  k. 

Basis  step 

We  first  show  that  {d^,  0<n<m,  1<J<  ^  ^  0.  i  ^ 

j  <  B}  U{^m+ii  i  ^  y  ^  flo}  is  a  set  of  associated  RV’s  .  By  hypothesis  we  already 
know  that  {dj,,  I  <  n  <  m,  1  <  j  <  B}  ^  0}  U{^n>  ™  ^  0.  i  <  J  ^  5}  is 

a  set  of  associated  RV’s  .  Now,  for  1  <  j  <  Bq. 

‘^m+i  =  +<^m  -  0)  (5- 1.5) 

is  an  increasing  function  of  associated  RV’s  which  proves  the  result. 

Inductive  step 

Assume  {d^,,  l<n<m,  1<J<  B}U{~’’n.  "  ^  0}U{^A*  «  ^  0i  i  <  i  1 
U{<^m+ 11  f  ^  J  ^  is  *  of  associated  RV’s  for  Bq  <  i  <  k  where  Bq  <  k  <  B. 
We  now  show  that  the  hypothesis  holds  for  k.  The  expression  for  i  is 

‘fm+i  +  -r„)  (5.1.6) 

«<p(k) 

which  is  an  increasing  function  of  associated  RV’s  ,  hence  the  result. 

This  complete  the  induction  step  on  k  and  the  hypothesis  is  true  for  k  =  B. 

This  completes  the  induction  step  on  m  and  proves  the  lemma. 

Remark 

Lemma  7  holds  under  the  weake"  assumptions 

{’■n}”  “  independent  of  l<  j  <  B, 

a  of  aoaoeiated  RV'a  and 
I  <  y  <  a  of  aaaoeiated  RV'a. 

5.2  Bounds  based  on  stochastic  ordering 

This  section  wiU  mainly  deal  with  distribution  functions  rather  than  with  RV’s  . 

Definition  8 

Let  F  and  C  be  two  diatributiona  funetiona  on  R.  F  ia  aaid  to  atoehaatically  dominate  G,  F  >»»  C, 

iff 
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''(z)  <  G{z),  V  f  R. 


(5.2.1) 


If  a  and  b  are  two  real  valued  RV’s  ,  we  shall  say  that  a  >«t  b  whenever 

P[a  <  zl  <  P\b  <  x],  Vr  £  R.  (5.2.2) 

A  consequence  of  the  above  definition  and  property  5  of  associated  RV’s  is 
Lemma  9 

Let  (oi ,  ...,a„)  be  a  set  of  associated  real  valued  RV's  with  respective  distribution  function  F i , ...,  F„. 
Let  F  be  the  distribution  function  of  mai(ai,  ...,an).  Then 

fif..  (5.2.3) 

•=i 

Last,  we  state  the  following  obvious  lenima. 

Lemma  10 

Let  (Fi,...,  F„]  and  be  two  families  of  distribution  functions  on  R.  If  Fi  >,t  Gt  = 

l,n,  then 

n  n 

Fi.F,...F„  =  Yl^^  =  Gi.G2...G„  (5.2.4) 

t=i  *=1 

and 

Fi  •  F,*  F„  >„  Cl  *  Cj  *  ..  •  Gn,  (5.2.5) 

where  .  and  *  respectively  denote  the  product  and  the  convolution  of  distribution  functions . 

In  the  sequel,  network  0  is  given  as  in  the  preceding  sections.  We  denote  as  (  resp.  T~  ) 
the  distribution  functions  on  R  of  the  RV  <t^  (resp.  -  rn).  Notice  that  E^  has  it  support  on  R* 
and  T~  on  R~ . 

We  define  a  sequence  n  >  0,  I  <  j  <  B  of  distribution  function  on  R  by  the  following 
recursion 


and 


^5=  n  (^o’2‘o),  J  =  l.s  (5.2.6) 


(5  2  7) 


In  these  definitions,  the  product  over  an  empty  set  is  always  understood  as  the  step  distribution 
function  U  defined  by 


U{t)  =  Q,  t  <0,  t/(t)  =  1.  «>0 


(5.2.8). 
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It  can  be  checked  by  induction  that  the  RV’s  have  ihcir  support  on  R* 

Theorem  11 

Astume  H\  \»  satiafied.  Let  be  the  dialribution  function  of  the  RV'a  dj,,  n  >  0,  1  <  j  ^  ZJ.  Wf 
have  then 

Di.  n>0,  j=1.5.  (5.2.9) 


Proof 

The  proof  is  by  induction  on  n.  Here  df{a)  denotes  the  distribution  function  of  the  RV a. 

Basis  step  n  =  0.  This  step  is  shown  by  induction  on  j. 

Basis  step 

Consider  queue  j  where  p(i)  =  0.  DJ  =  Dq  =  f/,  so  that  the  result  holds  true. 

Inductive  step 

Assume  the  theorem  is  true  for  Bq  <  j  <  B.  We  now  show  that  it  is  true  for  j  -r  I . 
Note  that  p{j)  ^  0.  We  have 

ll  n  (5-2.10) 

•‘p(y-n)  •‘p(j+i) 

(by  induction  hypothesis  and  lemma  10) 


>,t  dfi  max  (To  +  aj,) 

»«pO  +  >) 

(by  lemma  7  and  lemma  9  plus  assumption  Hi  which  entails  that  d^  and  Oq  are 
independent  RV'a). 


(by  definition). 

This  proves  the  basis  step  for  n. 

Inductive  step 

Assume  that  the  theorem  is  true  for  n.  We  now  show  that  it  is  true  for  n  -4-  1  by  induction  on  j. 
Basis  step 

We  first  it  for  j  (  V  such  that  p{j)  =  0  : 


DUi  =  U.{bi  .  Ej;  •  T-)  U.{Dl  *  E’  .  T-) 


(induction  assumption) 


(5.2.11) 


=  df{maz(di,  +  cj,  -  r„,  0)) 

(by  assumption  Hi  which  entails  that  di^  is  independent  of  —  r„) 

=  ^Ui- 
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This  completes  the  basis  step. 

Isduction  step 

We  now  assume  the  theorem  is  true  for  Bq  £  J  <  5  and  prove  it  for  j  -r  l.We  have 


t«p(j+l) 

•«p(j  +  l) 

(inductive  hypothesis  and  lemma  10). 


(5.2.12) 


>,t  df{max{  max  (d*„+,  +  -  r„)) 

«p(j+i) 

(where  we  used  that  is  independent  of  and  d^'^'^of  -  >■„  due  to  Hi, 

then  that  (dj,+,  and  -  r^)  form  a  set  of  associated  RV’s  due  to 

lemma  7  and  finally  lemma  9) 


—  ^n+l 

(by  definition). 

This  concludes  the  proof  of  the  inductive  step,  and  the  proof  of  the  theorem. 

The  next  result  concerns  the  extension  of  the  transient  bounds  of  theorem  12  to  steady  state. 
Hi  will  denote  the  following  set  of  assumptions  : 

Hi  Astumption  Hi, 

The  sequence  {fn}?*  “  i-i.d.  with  integrable, 

The  sequence  is  i.i.i.  with  <r^  integrable  for  all  j  =  1,  B. 


Theorem  12 

Let  j  be  fixed  I  <  j  <  B.  Assume  Hi  holds  and  that  for  all  i  e  p(j),  converges  weakly  to  a 
finite  and  integrable  distribution  function  when  n  goes  to  oo.  Assume  in  addition  that 

E[<xi]  <  E\r].  (5.2.13) 

Then  the  distribution  functions  D^,  converge  weakly  to  a  finite  distribution  function  when  n 
goes  to  oo.  Denote  as  Dj  the  distribution  function  of  dj,.  Under  the  foregoing  assumptions,  the 
distribution  functions  D},  converge  weakly  to  a  finite  distribution  function  when  n  goes  to  oo 
and  D’gg  stochastically  dominates  D^g,  namely 

Dig  <„  Dig.  (5.2.13) 

The  proof  is  found  in  Appendix  3. 

6  Applications  of  bounds  based  on  convex  ordering 


The  following  set  of  assumptions  will  be  assumed  to  hold  throughout  the  section: 


Hi-  The  j  +  1  sequences  J  —  1.  -.3  mutually  independent 

6.1  Determinism  minimizes  response  times 

The  property  that  under  certain  independence  assumptions,  deterministic  interarnval  times  ; 
resp.  service  times  )  minimize  response  times  in  G/G/1  queues,  as  shown  in  [St  84)  and  Wh  8-1  , 
can  be  extended  to  AFJQN’s  using  Theorem  3. 

Let  (  resp.  ),  j  =  be  the  response  Limes  obtained  for  the  constituting 

sequences  and  {0^}“,  j  =  1,  .,B  (  resp.  and  j  =  )  respectively 

de&ned  by  the  equations: 


and 


=  B(r„],  n  >  0 


j  =  1,..,  B,  n  >  0 


(6.1  li 
(6.1.2) 


fn  =  r„  ,n>0  (6- 1-3) 

j  jo,  n>Q  (6.1.4) 

=  n>0  (6.1.5) 

where  jo  >9  »ny  fixed  integer  1  <  jo  <  B. 

Corollary  13 

For  all  n  >  0  and  j  =  the  following  inequalities  hold 

di<cidi  .n>0  (6.1.6) 

and 

di<c^d{  ..n>0  (6.1.7) 

Proof 

Let  G  (resp.  G)  be  the  sub  a-fields  of  F  generated  by  the  RV’s  {ffi}",  j  =  I,  -,  B  (  resp.  (r^}* 
and  [On)^ ,j  =  1,  B,y  5^  Jo  )■  We  first  get  from  the  independence  assumption  that 


and 


so  that  Theorem  3  entails 


and 


T„  =  B(r, 

»|C1. 

n  >  0 

(6.1.8) 

-ii 

3 

II 

t*3 

n  >  0 

(6.1.9) 

or„  =  £:(<r„|G{, 

n  >  0 

J  =  I . 

;B, 

(6.1.10) 

=  BfonlGl. 

n  >  0 

J  =  1. 

■,B, 

(6.1.11) 

d\  <  E\dl,\G], 

n  >  0 

3  = 

...fl 

(6.1.12) 
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d^^<E[di\Gl  n>0  j  =  (6.113) 

Equations  (6.1.6)  and  (6.1.7)  are  mere  rephrasing  of  (6.1.12)  and  (6.1.13)  respectively. 

The  lower  bounds  (6.1.12)  and  (6.1.13)  on  extend  to  steady  state  when  the  constituting 
sequences  {rn}"  and  j  =  satisfy  the  assumptions  of  Theorem  2.  Indeed,  these 

conditions  entail  that  both  the  constituting  sequences  {?„}“,  j  —  l.-.'S  and  and 

J  =  1. ..,  B  satisfy  the  assumptions  of  Theorem  2.  Hence  Corollary  5  applies  to  show  that 
the  bounds  of  Corollary  13  extend  to  steady  state,  namely 


di,, 

dL<c^di„  j=l,..,B. 


(6.1.14) 

(6.1.15) 


6.2  Networks  in  random  environment 


The  problem  of  determining  the  statistics  of  isolated  queues  with  time  varying  interarrival 
times  was  considered  in  the  markovian  case  in  (Ma  85].  For  the  general  G/G/1  FIFO  queue, 
bounds  are  also  available  when  the  variations  depend  upon  an  independent  stationary  and  ergodic 
"environment”  process.  It  was  shown  in  (BM  86]  that  the  waiting  time  statistics  in  such  a  queueing 
system  are  bounded  from  below  by  those  of  the  same  queue  with  the  environment  process  kept  to 
its  mean  value  (see  also  (Ro  83]).  Theorem  3  allows  to  extend  this  result  to  any  AFJQN  0.  As  in 
[BM  86],  the  environment  process  is  assumed  to  be  a  non-negative  real-valued  stochastic  process 
V'(t),t  e  R  on  {Q,F,P)  being  ergodic  and  stationary.  Two  stationary  and  ergodic  sequences  of 
nonnegative  RV’s  are  assumed  to  be  given:  {fn}*  {®'n}o°»  J  ~  ■AJl  these  RV’s  are 

assumed  to  be  integrable  with  B(V(t)]  =  1  holding  in  particular.  The  modulation  of  the  arrival 
process  is  obtained  by  accelerating  time  proportionally  to  V,  so  that  the  effective  interarrival  times 
in  the  random  environment  network  are  given  by  the  sequence  defined  by 


=  /  V’(s)ds,  n  >  0. 


(6.2.1) 


Let  (di)?’  (  resp.  )  be  the  response  times  obtained  for  the  constituting  sequences 

(  resp.  {r„}«  )  and  j=  l,..,B. 

Corollary  14 

If  the  Btoehastie  process  ^^(t),*  «  R  is  independent  of  {rn}o®  3'~ 

following  inequality  holds  for  all  n  >  0  and  j  =  1,..,  B 


di  >«,  dl 


(6.2.2) 


Proof 

Let  G  be  the  sub  a-fields  of  F  generated  by  the  RV’s  J  —  L  B  and  {rn}o° •  1^  shown 

in  [BM  86]  that  under  the  enforced  assumptions,  for  all  n  >  0 


E\fn\G\  = 

Equation  (6.2.2)  is  now  obtained  as  a  direct  consequence  of  Theorem  3. 


(6.2.3) 
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Consider  a  fixed  queue  j.  Observe  that  under  the  forc);oing  assumptions,  if  and 

=  1, 5  satisfy  the  conditions  of  Theorem  2  for  then,  and  -  1,  .,Zf 

also  satisfy  the  conditions  of  Theorem  2  for  j,  so  that  the  bounds  of  Corollary  1-1  then  extend  to 

steady  state,  namely 

(6  2.-1J 


6.3  Bounds  on  parallel  networks 

Theorem  3  also  provides  lower  and  upper  bounds  for  the  following  problem,  a  particular  case 
of  which  was  considered  in  [BM  85|.  Let  0  be  any  AFJQN  made  of  K  AFJQ  subnetworks  qj  ,..,q^ 
in  parallel  with  resoective  underlying  graphs  Gi  =  {Vi,Et),  I  <  I  <  K.  Denote  as  R„  the  n-th 
network  response  time: 


^  ,(‘^n  +‘^i)  (6.3.1) 

for  the  constituting  sequences  {rn}S“  and  j  =  Let  denote  the  n-th  response 

time  in  the  subnetwork  oj  ,  1  <  /  <  /f  for  the  constituting  sequences  {r^jg®  and  j  (  V|. 

+  (6.3.2) 

«€P|(B  +  1) 

where  pi{B  -r  1)  denotes  the  queues  of  p{B  -f  1)  which  belong  to  V|.  Owing  to  the  parallel  structure 
of  0,  we  have 


Let  finally  denote  the  n-th  response  time  in  o/  for  the  constituting  sequence  and 

=  1.  -.  5,  defined  by  equations  (6.1.1)  and  (6.1.2). 

Corollary  15 
For  all  n  >  0 

>t.  max  R‘„.  (6.3  4) 

Proof 

It  was  established  in  the  proof  of  Corollary  13  that 

di,<E{di\G\.  n>0  j  =  l,..,B  (6  3  5) 

This  and  Jensen’s  Theorem  can  be  used  in  (6.3.3)  to  yield 

fl|.  <  n>0  l  =  l,..,/f.  (6  3  6) 

Using  now  this  last  inequality  and  Jensen’s  Theorem  in  (6.3.4),  we  get 

£:|/Z„|(7l  ^  n  >  0-  (6  3  71 

Combining  equations  (6.3.6)  and  (6.3.7),  we  finally  obtain 
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n  >  0, 


(6.3.8) 


>  max  Rn, 
i<‘<K 

which  implies  (6.3.4). 

Remark 

Notice  that  due  to  our  mutual  independence  assumption  on  the  sequences  j  = 

the  sequences  ^  ^  are  mutually  independent  as  well.  In  other  words,  Corollary 

14  allows  us  to  derive  lower  bounds  for  the  network  response  times  that  reduce  to  computing  the 
maximum  of  K  independent  RV’s  being  the  response  times  of  subnetworks  of  smaller  size  than  the 
initial  one. 

Upper  bounds  can  also  be  obtained  using  convex  ordering  in  the  following  particular  case;  as¬ 
sume  the  arrival  process  is  divisible  in  the  sense  that  there  exist  K  mutually  independent  sequences 
of  RV’s  which  satisfy  the  mean  condition; 


rn 


n  >  0. 


(6.3.9) 


Let  (resp.  Aj,)  denote  the  delay  between  the  n-th  arrival  and  the  beginning  of  the  n-th  service 
in  queue  j  (resp.  the  n-th  response  time)  in  Vj  for  the  constituting  sequence  and  = 

1.  M 3,j  (  at. 

Corollary  16 


For  all  n  >  0 

Rn  <e,  max 

\<l<K 


(6.3.10) 


Proof 

Let  C  be  the  sub  <T-algebra  of  F  generated  by  the  RV’s  {rn}§®  and  j  =  1,..,  R.  For  all 

n  >  0,  We  get  from  the  exchangeability  of  the  RV’s  and  the  independence  assumptions  that 

for  2dl  n  >  0, 


Using  Jensen’s  inequality  in 


(6.3.11) 


we  get 


(6.3.12) 


E[  max  > 


max  max  (R[Jj^|C] -I- o^). 
1<I<K  JcK, 


(6.3.13) 


This  together  with  Theorem  3  entail 


E\  max  > 


max  maxldi.  +  ffl.)  =  Tn, 

1<1<K  }<V, 


(6.3.14) 


which  completes  the  proof  of  (6.3.10). 

Notice  that  for  this  upper  bound  too,  the  RV’s  are  mutually  independent  and  can  be 
obtained  by  considering  subnetworks  of  smaller  dimensions  than  the  initial  one.  Observe  that  if 
{r„}S»  and  j  =  (  resp.  and  >  =  I.  -  )  conditions  of 
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Theorem  2  for  all  j  =  the  bounds  of  Corollary  15  (  resp.  16  )  then  extend  to  steady  slati 

namely, 


f?oo  >f.  iTiax  .  (6.3,15) 

\<1<K 

and 

Roo<c,  max  (C3  16) 

1<1<K 

6.4  Bounds  on  series  networks 

Let  0  be  any  AFJQN  made  of  K  AFJQ  subnetworks  qi,..,qx’  in  series  with  respective  underlying 
graphs  Gi  =  (Vj.Ej),  I  <  t  <  K .  Owing  to  the  series  structure  of  the  network,  the  subnetworks 
01  ,  ^  ^  I  ^  R  of  0  obtained  by  considering  only  the  queues  of  Vi  (J  ..  (J  V)  are  also  in  the  AFJQN 
class.  Let  denote  the  n-th  response  time  in  ^,*  for  the  constituting  sequences  and 

j  €  ViU..UVi.  Let  also  tj,  denote  the  n-th  interdeparture  time  of  the  output 

stream  of 


=  ">0-  (6  4.1) 

Owing  to  the  series  structure  of  0,  Rn  can  de  decomposed  into  the  sum: 

K 

isl 

where  denotes  the  n-th  response  time  in  the  AFJQN  oi,  for  the  interarhval  times  sequence 
service  times  sequence  in  V)  and  where  (°  stemds  for  r„,  n  >  0. 

Similarly,  let  pj,  denote  the  n-th  response  time  in  the  AFJQN  ai,  for  the  constituting  sequences 
aod  {^A}o“.  J  in  ^i,  where 


=  n>0 

K  J  <  Vi,  n  >  0. 

Corollary  17 

For  all  n  >  0,  the  following  inequality  hold$ 


(=1 


(6.4.3) 

(6.4.4) 


(6.4.5) 


Proof 

Let  G/  be  the  sub  <7-algebra  of  F  generated  by  the  RV’s  {»■„}“  and  J  -  L  -  .  J  in 

V/U  — IJVx-  Owing  to  the  independence  assumptions,  we  have  ,  for  all  n  >  0,  1  <  1  <  A 

E\tl\G,\  =  CIi;,l  (6.4.6) 

and,  for  all  j  t  Vj 

Ela'jGt]  =  c  V,.  (6.4  7) 
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Hence,  Theorem  3  applied  to  the  network  oj,  entails  that,  for  all  n  >  0,  I  <  t  <  h 


E\pI\G,\  >  (6  X  6) 

This  together  with  equation  (6.4.2)  readily  entail  (6.4.3). 

Observe  that  (6.4.5)  obviously  holds  at  steady  state  provided  the  Srst  moments  involved  in 
this  equation  converge. 

7  Applications  of  bounds  based  on  association 

The  condition  Hi  will  be  assumed  to  hold  throughout  the  section  so  that  the  assumptions  of 
Lemma  7  and  Theorem  12  are  satisfied. 

7.1  Bounds  on  parallel  networks 

The  notations  are  those  of  section  6.3:  Rn  (  resp.  R!^  )  denotes  the  n-th  response  time  in  3 
(  resp.  oj,  1  <  /  <  A"  )  for  the  constituting  sequences  {r„}“  and  3  —  I.  -  ^  Under  the 

foregoing  assumptions,  we  have  the  following  strengthening  of  corollary  16 

Corollary  18 
For  all  n  >  0 


l<t<K 


(7.1.1) 


Proof 

It  was  established  in  Lemma  7  that  the  RV’s  j  —  I.--.  associated.  Hence,  the  RV’s 

R‘^,  n  >  0,  I  <  I  <  K,  which  are  given  by  (6.3.2)  in  terms  of  increasing  functions  of  associated 
RV’s,  are  also  associated,  owing  to  property  2  of  associated  RV’s.  Equation  (7.1.1)  is  hence  a  direct 
consequence  of  property  5  (equation  (5.1.2))  of  association. 

Assume  the  stability  condition  of  Theorem  2  is  satisfied.  (  Observe  that  condition  Hi  is 
stronger  than  condition  Ho-  )  Then,  the  random  vectors  (di),  j  =  I, -.,8  converge  weakly  to  a 
finite  random  vector  {d^},  j  =  1,..,B  when  n  goes  to  oo.  This  in  turn  implies  that  the  random 
vectors  I  =  l,.  ,K  (  resp.  the  RV’s  R^  )  converge  weakly  to  a  finite  random  vector  (  resp 

RV  )  {/?^),  I  =  (  resp.  Rgo  )  when  n  goes  to  oo. 

•Applying  now  proposition  (1-2.3)  of  (St  84]  to  the  weakly  converging  sequences  d/(Rn)  and  ni<(<K 
It  IS  plain  that  equation  (7.1.1)  extends  to  steady  state,  namely 

d/(Roo)<..  n  ^ 

1<1<K 

The  upper  bounds  of  equation  (7.1.2)  and  the  lower  bounds  of  equation  (6.3.4)  are  examplified  in 
Figure  3. 

7.2  More  general  bounds.  Relation  to  resequencing 

We  consider  now  the  case  of  more  general  AFJQN’s.  For  these  networks,  we  show  that  The¬ 
orems  11  and  12  can  be  used  to  provide  computable  upper  bounds  which  relate  to  resequencing 
models  analyzed  earlier  in  [BGP  84|.  The  discussion  of  these  bounds  will  be  limited  to  stead) 
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state.  It  is  assumed  that  each  queue  satisfies  the  assumptions  of  Theorem  12,  so  that  the  di.stri- 
bution  functions  (  resp.  Dn)<  J  ~  converge  weakly  to  a  proper  distribution  function 

^oo  (  when  n  goes  to  oo  and  Dl^  <»t  for  all  I  <  j  <  B.  Denoting  as  S’  (  resp 

T~  )  the  common  distribution  function  of  the  RV’s  {<r^},  j  =  (  resp.  — {r„}),  it  follow.s 


from  equation  (5.2.7)  that  the  distribution  functions  j  =  1,. 

(7.2.1)-(7.2.3)  below  : 

.,B  satisfy  the  set  of  equations 

(7.2.1) 

for  j  such  that  p(j)  =  0  and 

(7.2.2) 

for  j  such  that  p(j)  ^  0,  where 

=  n  •  s- 

(7.2.3) 

•«p(» 


This  set  of  functional  equations  can  be  solved  recursively  as  follows: 

First  compute  the  solution  of  equation  (7.2.1)  for  all  1  <  j  <  Bq.  This  equation  is 
the  functional  *quation  satisfied  by  the  distribution  function  of  the  stationary  waiting  times  in  a 
GI/GI/1  queue  with  service  times  distributed  according  to  and  (negative)  interarrival  times 
according  to  T~ . 

Next,  compute  by  induction  as  follows.  Assume  that  the  distribution  functions 

are  known  for  some  j  >  Bo-  Notice  first  that  this  and  equation  (7.2.3)  fully  determine 
the  distribution  function  on  R*.  Hence,  the  only  unknown  in  equation  (7.2.2)  is  Dl^.  This 
equation  is  the  functional  equation  satisfied  by  the  distribution  function  of  the  stationary  end-to- 
end  delays  in  a  GI/GI/GI/1  resequencing  queue  as  considered  in  [BGP  84]  with  desordering  times 
distributed  according  to  ,  sevice  times  distributed  according  to  and  (negative)  interarrival 
times  according  to  r~. 

The  end  of  this  section  is  devoted  to  computational  problems  related  to  the  solution  of  tb^^e 
functional  equations.  General  techniques  for  solving  (7.2.1)  are  well  known  (  see  for  instance  [Co 
85]  for  a  detailled  discussion  ) 

We  consider  now  equation  (7.2.2),  the  general  form  of  which  is 

D  =  A.[D*L»T-),  (7.2.4) 

where  A,  E  and  T~  are  known  distribution  functions  on  R  with  their  support  on  R"^ ,  /?"*■  and 
R~  respectively,  C  =  E  •  T~  has  a  negative  mean  and  D  is  the  unknown  distribution  function  on 
R^.  Closed  form  solutions  have  been  derived  for  the  solution  of  (7.2.4)  in  (BGP  84]  for  certain 
classes  of  distribution  functions  A  and  T~  namely  A  hyperexponential  and  T~  exponential.  For 
more  general  classes  of  distribution  functions,  it  is  established  in  Appendix  3  that  the  following 
numerical  schema  converges  towards  the  solution  of  (7.2.1); 

^;+.(0  =  ^M0  r  Fi[t-u)dC^{u),  n>0,  t€  R,  (7.2.5) 


where  =  E^  •  r_  and 


Fl  =  Ar  (7,2  6) 

Here,  the  functions  ^'a(0.  ^  ^  F.  are  distribution  functions  on  R  with  support  on  R'^  and  the 
convergence  of  F„  towards  the  solution  of  (7.2.1)  has  to  be  understood  in  the  sense  of  the  weak 
convergence. 

In  conclusion,  Theorems  11  and  12  provide  a  general  method  to  compute  upper  bounds  on 
the  stationary  delays  through  AFJQN’s  with  i.i.d.  constituting  sequences.  The  computation  of 
these  bounds  reduces  to  determining  Bq  stationary  waiting  time  distribution  functions  of  GI/GI/I 
queues  and  B  -  Bq  stationzu-y  state  end-to-end  delays  in  GI/GI/GI/1  resequencing  queues. 

Appendix  1 

The  basic  idea  for  proving  theorem  2  consists  in  generalizing  the  schema  of  Loynes  for  the 
response  time  of  a  G/G/l  queue  ([Lo  62)),  to  the  response  times  of  our  network.  Let  us  first 
consider  the  sequence  {r„}g“  and  for  all  j  £  1,B  as  the  right  half  of  certain  bi-infiniie 

sequences  {rn}!^“  and  on  (n,F,P).  We  shall  assume  that  {Cl,F,P)  is  the  canonical 

space.  Hence  P  will  be  assumed  to  be  6  -invariant  (stationary)  and  tf-ergodic.  Let  us  denote  by 
r  the  difference  ai  -  oq,  and  by  the  variable  <Tq.  Consider  now  the  schema  {54}“  defined  by 
^0  ~  “'d  for  n  2  0  : 

o  ^  =  rnaz(  max  ((54+i  +ff’)  otf),  +  ~  r).  (A. 1.1) 

p(j) 


Lemma  1 

For  any  j  (  B,  the  sequence  {54}n>o  increasing. 

Proof 

Let  us  first  prove  this  for  1  <  j  <  Bq.  It  is  clear  that  5j  >  0  =  5q.  Assume  now  that  54  >  54_,  for 
some  n  >  1.  From  (A. 1.1),  we  get  ; 


°  ^  =  mox(0,  54  <7^  -  r)  >  maz(0,  54_,  -  r)  =  S^o9,  1  <j  <  Bq.  (A.1  2) 

By  induction,  the  54, «  are  thus  increasing. 

Now  consider  j  such  that  p(j)  ^  0.  By  the  induction  hypothesis,  we  can  assume  that  the  RVs 
54  are  increasing  in  n  for  i  e  p(j).  We  prove  first  that  >  5q.  We  have 


o  =  rrjaz{ max (5J  +  a')  o —  r)  >  max ((5J  -t-  a*)  off)>  max ((5q  +  a')  06),  (A  1  3) 

•«p{j)  **p{}) 

where  we  have  used  our  assumption  5J  >  5o.  Notice  that  the  last  expression  is  5^  o  ^  so  that  the 
property  is  proved.  Assuming  now  that  54  >  54_|,  by  (A. 1.1)  we  get 

54  +  ,  oe>  mQz(max((5;+,  4  <r’)  otf),54_i  +  <t’  -  r)).  (A  1  4) 

•«p{j) 

Since  the  54  are  increasing  for  i  <  p{j)  we  get  from  the  last  expression  that 
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>  mai(max((6^^j  +  a')o9),  -r  -  t))  -  6j^  o  9 


(A  1  ->i 


and  so  increases  in  n. 

Lemma  2 

Let  61^  be  the  limiting  value  of  the  increasing  sequence  when  n  goes  to  infinity.  Under  the 
assumptions  of  theorem  2,  61^  <  oo.  If  there  exists  an  t  f  x{j)  such  that  E\al^\  >  E\r„:  then 
SL,  =  oo  o.j. 

OO 

Proof  The  limiting  variables  satisfy  the  pathwise  equation  : 

Slo  o  9  =  mai(max((5^  +  o  9),  8^^  +  -  r)  (<4.1.6) 

••pI}) 

For  I  <  j  <  Bq,  (a.  1.6)  reduces  to 

81^0  $  =  max{0,6l^  +  -  r).  (A. 1.7) 

Equation  (A. 1.7)  shows  that  the  event  {6^  =  oo)  is  tf-invariant.  Therefore,  this  event  is  either 
of  probability  0  or  1.  Assume  that  it  is  of  probability  1.  By  the  increasingness  property  we  have 

£;(mai(0,  +<T^  -  r)  -  6^;]  =  E{6:,^,o9-  6^]  =  ElS^,  -  5^]  >  0.  (A.l.S) 

iFrom  this  we  get 


lim  E(max(0,  -  r)  ~  >  0.  (A.  1.9) 

n-^oo 

Using  now  Lebesgue’s  theorem,  this  inequality  is  preserved  with  limit  taken  inside  the  expectation 
If  we  assume  that  6^  T  oo.  then  get 


>  E(rl.  (A. 1.10) 

Now  taking  the  contrapositive  of  this  argument,  we  see  that 

£:(<r^|<Elrl  (A.1.11) 

is  sufficient  to  have  6^  finite  a.e.  This  completes  the  proof  of  the  first  part  of  the  lemma  for 
1  <  i  <  Bo- 

Let  j  be  such  that  Bq  <  j  <  Bq.  Assume  now  that  for  all  i  €  Jr(j),  6^  is  a.e.  finite  and 
integrable.  The  proof  that  condition  (A. 1.11)  entails  6^,  finite  a.e.  proceeds  as  follows.  The  event 
{5^  =  oo}  is  shown  to  be  ^-invariant  from  (A. 1.6).  The  inequality 

lim  sup  E|(mai(  max  ((5* +  i  +  <^’)  o  ^).  “  ^nl)  -  ®  (A.  112) 

n-»oo  •<p(j) 

is  then  established  using  the  increasingness  of  6^  and  its  integrability  as  in  (A.l.S).  One  also  gets 
from  elementary  manipulations  that 
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A'„  =  {mrxl  m»v  ((5^^  j  +a')od).6^+a^  -  ~  <^4!)  ^  {{K+ i  ^  I  °  ~  (■^■•■13) 

*«p(j)  xpi;) 

^From  the  increasingness  of  t  (  p{j),  we  get  hence 

X,,  <  {muxiiSl,  +  o')  o9)+a^  -r.  (A. 1.14) 

•«p(j) 

Owing  to  the  integrability  assumptions,  it  follows  from  (A.  1.1 4)  that  the  The  RV’s  X^  are  uniformly 
bounded  from  above  by  an  integrable  RV.  The  Fatou-Lebesgue  lemma  and  (A. 1.12)  entail  then 

fTjlimsupXn]  >  limsup  iTjA'^j  >  0.  (A. 1.15) 

n  n 

Under  the  assumption  <  oo  a.e.  for  all  i  c  t(j),  the  hypothesis  oo  implies  that 

limsupXn  =  <7'’ -  r,  (A. 1.16) 

n 

so  that  queue  j  satisfies  condition  (A. 1.10).  The  rest  of  the  proof  follows  exactly  as  before. 

Proof  of  Theorem  2 

We  get  by  induction  that  =  S^o  6^  (use  the  fact  r„  =  r  o  =  <to  n  >  0).  Hence 

and  5^  have  the  same  distribution  due  to  the  0-invariance  of  P.  The  weak  convergence  of  the  law 
of  d^  to  a  proper  distribution  is  now  a  direct  consequence  of  the  increasing  a.e.  of  5^  to  the  finite 
random  variable  5^. 

Appendix  2 

i  -  A  stationary  queueing  system  where  an  increased  variability  of  interarrivals  decreases  the  van- 
ability  of  interdeparture  times. 

Consider  a  Gl/Mfl  queue.  The  steady  state  distribution  for  the  number  of  customers  just  after  a 
departure  is  geometrically  distributed  with  parameter  <r  which  is  the  smallest  positive  real  root  of 
the  equation 


<7  =  A*(/i(l  -  (7)), 


(A.2.1) 


where  A*  denotes  the  Laplace  transform  of  the  interarrival  times  and  the  mean  service  time. 
The  interdeparture  distribution  function  has  hence  the  following  Laplace  transform 


D*(s)  =  (1  -  <7)Zlfc>l<7 


k  M 


+  (1  -  <7)A*{s) 


H  +  s 


=  (1-(7)A*(s)-^+17  ^ 


/i  +  s 


p-t  s 


(A.2.2) 

(A.2.3) 


The  mean  interdeparture  time  is  hence 


-136- 


</=  -^  +  (1  (a. 2.4) 

fl  A 

where  denotes  the  mean  interarrival  time.  Consider  the  two  cases  where  A*  is  exponential  and 
deterministic  with  the  same  mean  A~' 


(A.2  5) 

A‘,(a)=exp(-j).  (M.2-6) 

The  distribution  function  corresponding  to  AJ  is  larger  for  convex  ordering  than  the  one  corre¬ 
sponding  to  AJ.  However,  tri  >  ct  ao  that  di  <  d^. 

2  -  A  stationary  queueing  system  where  an  increased  variability  oj  interarrivals  increases  the  vari¬ 
ability  of  interdeparture  times. 

Consider  a  stable  D/D/1  queue.  Let  A  denote  the  intensity  of  the  arrival  process.  The  sta¬ 
tionary  interdeparture  times  have  deterministic  distribution  with  mean  A~*.  Here,  an  increased 
variability  of  interarrivals  increases  the  variability  of  interdeparture  times. 

Appendix  3 

In  this  section,  weak  convergence  of  distribution  functions  on  R  will  be  denoted  as  =0-.  We 
establish  hrst  that  under  the  assumptions  of  Theorem  12 

(A.3.1) 

and 

Di  =>  Di,  (A.3.:) 

when  n  goes  to  oo,  where  and  are  proper  distribution  functions  on  R'*’.  We  establish  the 
convergence  (A.3.2)  first.  The  property  is  first  proved  for  j  such  that  p(j)  =  0.  For  such  a  j,  D’ 
represents  the  distribution  function  of  the  n>th  waiting  time  in  a  GI/GI/I  FIFO  queue  and  classical 
results  in  queueing  theory  (Co  85]  can  be  used  to  establish  (A.3.1)  provided  <  ^[^nj. 

The  convergence  (A.3.2)  is  now  established  by  induction  for  all  Bo  <  J  S  5.  Assume  queues 
1.-mJ  —  1  to  be  in  steady  state  for  some  j  such  Bq  <  j  <  B.  Then  equations  (5.2.6)  and  (5.2.7) 
read  respectively 

bi  =  A'  (A.3.3) 

and 

•  E’  -r-),  n  >  0,  (.4  3  4) 

where 

A^  =  Y[  b*^*r.  (A.3.5) 
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Let  {q^I^oo'  {'■n}~oo  independent  sequences  of  i.i.d  RV’s  with  respective  dis¬ 
tribution  functions  and  T~ .  Consider  the  /?'*'-valued  Markov  chain  {yi}o“  defined  by  the 

recursion 

Vn-n  =  ^  0-  -  Ci) 

where 

Vo  =  “o-  (-4.3.7) 

Using  the  independence  assumptions,  it  is  plain  from  (A.3.3)-(A.3.5)  that  df{yi^]  =  for  all 
n  >  0. 

Denote  Oq,  and  tq  as  a^,  er^  and  r  respectively.  Using  the  same  formalism  as  in  Appendix  1, 
define  the  Loynes’  schema  {2n}o°  recursion 

Zn+ioS  =  max{cid o$ , +a^  -  t),  n  >  0,  (A. 3. 8) 

where 

zS  =  a^  (A.3.9) 

One  proves  as  in  Appendix  1  that  z’  increases  pathwise  with  n,  z^  yi  for  all  n  >  0  and 

z^^j  o  ^  -  z^  <  a-’ o  ^  —  r.  (A.2.10) 

The  integrability  assumptions  are  then  used  in  (A.l.lO)  to  prove  that  the  RV’s  {zn}”  are  bounded 
from  above  by  an  integrable  RV.  The  remainder  of  the  proof  b  as  in  Appendix  1. 

The  numerical  schema  (7.2.5)-(7.2.6)  is  a  mere  rephrasing  of  equations  (A.3.3)-(A.3.5),  so  that 
its  convergence  towards  the  solution  of  (7.2.1)  b  a  direct  consequence  of  (A.3.2}. 

We  prove  now  the  convergence  (A.3.1).  It  was  establbhed  in  Theorem  11  that  under  the 
assumption  /fj 

n>0,  j=l,B.  (A  3.11) 

It  follows  from  the  discussion  of  Appendix  1  that 

Dj,=df{Si),  n>0,  j=  I,  B.  (A.3.12) 

Hence,  the  convergence  (A.3.2)  of  towards  a  finite  distribution  function  used  in  (A.3.1 1)  entails 
that  the  increasing  sequence  6^  cannot  converge  to  oo  almost  surely,  which  establbhes  (A.3.1). 
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